Skip to main content

Command Palette

Search for a command to run...

Stop the Flap: Practical FortiGate SD‑WAN Tuning for Stable Internet

The Story

Updated
6 min read
Stop the Flap: Practical FortiGate SD‑WAN Tuning for Stable Internet

A ticket arrives: users report intermittent internet outages. The firewall shows both links up, monitoring dashboards are green, and server speed tests look normal—yet users experience multiple daily disconnects. The timing coincides with a recent firewall installation and a simple SD‑WAN configuration using two ISPs. During troubleshooting, the SD‑WAN logs reveal a flood of alternating messages: “route removed due to SLA failure” followed by “route added back into route table.” After tuning the SD‑WAN settings, the instability disappears. The follow‑up—how do we prevent this from happening again—has an unsurprising answer: follow best practices and test.

Step-By-Step Guide

  • Prepare and baseline your WAN links. Document the following for each tunnel and ISP you have coming into the Fortigate.

    • Bandwidth

    • SLA Expectations

    • Labels (if they're not labelled, LABEL IT)

    • Correct IP, Gateway, Subnet Mask, and any specific routes

  • Test each link for ping, jitter, and throughput. You can use iperf for throughput, or speedof.me for internet speed tests.

  • Create SD-WAN Zones and add members

    • Network > SD-WAN > SD-WAN Zones

    • The biggest gotcha here is if you already have some of these interfaces in use. Follow the instructions here (link TBD) to get around this issue.

    • Create a zone for internet (I label it WAN, or Internet)

    • Create a zone for each tunnel group (e.g. Chicago for all tunnels going from this site to the Chicago site)

    • Add each interface to the appropriate member (e.g. Add Comcast Internet ISP Interface to Internet SD-WAN Zone)

    • Assign the correct bandwidth values (upload and download). If you don't have these numbers, make a note to find them and update this when you get them

  • Build realistic Performance SLAs

    • This part is SUPER important! Building unrealistic SLAs makes SD-WAN effectively useless.

    • Create SLAs for different traffic types (e.g. General Internet, VoIP, Microsoft SaaS, etc)

    • Use appropriate probe type (e.g. Ping is good for general internet, HTTP is good for specific software, TCP is good for VoIP or other software that may not host a web server)

    • Set thresholds based on real measurements. This is important! Your DSL in the middle of Wyoming will not get 5ms latency and SD-WAN should not remove the route when it goes above 10ms.

    • Avoid overly tight latency/jitter values. This will cause flapping, and poor user experience (and lots of tickets).

  • Assign SLA participants

    • For each SLA, select which members should be tested. Keeping this at default means that ALL SD-WAN members will be tested and can affect traffic.

    • Don't probe links that should never carry traffic.

    • Confirm SLA status shows healthy values and modify performance SLAs as needed.

    • Come back to these metrics in a week and see if you need to tune any.

  • Create intent-based SD-WAN rules

    • Match traffic by application, ISDB, or subnet

    • Use strategies aligned with business intent (whatever the users care about)

      • Best Quality -> VoIP, Teams, Zoom, etc

      • Lowest Cost (SLA) -> Bulk Traffic, Backups, or things that don't care as much

      • Maximize Bandwidth (SLA) -> General internet traffic (including your Pandora stream yes)

    • Attach the correct SLA to each rule

    • Order the rules from most specific to least specific

  • Tune failover behavior

    • Ensure SLA thresholds reflect real-world internet variance (again... your Wyoming DSL may go up above 50ms...)

    • Enable route removal when a link is out of SLA. The main gotcha here is that if you didn't set your performance SLAs correctly, or don't have something for it to fail over to... then it'll just fail.

    • Avoid aggressive failover times that cause oscillation (and angry users)

    • Test both hard failures (link down) and soft failures (high latency/loss). If you haven't tested... then test!

  • Integrate security without breaking SD-WAN

    • Keep firewall policies focused on security.

    • Keep SD-WAN rules focused on path selection.

    • Apply consistent security profiles (don't have one ISP with web filtering and the other have nothing... unless it's a business need!)

    • Ensure policy order doesn't bypass SD-WAN unintentionally

  • Validate SD-WAN decisions

    • Use the SD-WAN monitor to confirm path selection for all intended traffic

    • Check SLA logs for stability. This is important as you may have flapping and not notice it!

    • Run tests! Run them again! Involve users. Your users should have a basic understanding of their business needs, so they should know what works and what doesn't. The best question is "Is this working how you expect and want it to?"

    • Simulate link degredation then have users test again

  • Monitor and adjust

    • A week later, go and review hit counts on all your rules

    • Watch SLA trends and see what thresholds you need to adjust

    • Send logs to FortiAnalyzer (FAZ) or another SIEM for long-term visbility

  • DOCUMENT IT

    • No really. Document it.

    • If it's not documented then you're not done yet.

Common Mistakes

  • Using default SLAs without tuning thresholds

  • Mixing dissimilar links in the same rule without intent.

  • Not enabling app control/application identification

  • Over-relying on volume-based load balancing

  • Not testing soft failures and only testing hard failures (pull the cable)

Conclusion

In short: SD‑WAN stability is rarely fixed by magic—it's fixed by measurement, sensible thresholds, and careful tuning. Baseline each WAN, document everything, choose realistic SLA targets, and make your probes and failover logic reflect real user experience (not just idealized pings). When you see route flapping, don’t reflexively replace hardware—tune probe intervals, consecutive-failure thresholds, and target endpoints; label and group interfaces; and validate failover behavior with controlled tests. Finally, treat tuning as an iterative process: start conservative, monitor for a week under normal load, then tighten settings only when metrics show you can. Do this and you’ll turn “mystery disconnects” into predictable, manageable behavior.

Quick checklist to finish the job

  • Baseline each link (latency, jitter, packet loss, throughput) and record SLA expectations.

  • Use reliable, consistent probe targets (ISP or well-distributed public endpoints).

  • Reduce sensitivity to noise: increase probe interval or consecutive-failure count before marking SLA failed.

  • Tune thresholds to reflect real user tolerances (allow small jitter/brief loss).

  • Label interfaces/zones and document routes, weights, and policies.

  • Test failover in a controlled window and monitor for several days before finalizing settings.

How-Tos

Part 1 of 1

How-to guides for networking and network security