Three forex bots, fifteen production bugs, and one hard lesson about the gap between a good backtest and a profitable live account. Here's what happened — in order.
The bot's backtest said PF 1.40. The live account's first 120 trades said PF 0.46. Closing that gap is what this journey has been about.
A friend of mine turned a small deposit into roughly $300k over a few years of careful, mostly-manual forex trading. I wanted to understand what he was doing well enough to automate a version of it on my own laptop — no VPS, no prop firm, no subscription signal service. Just a script that reads the market and takes the trades a disciplined person would.
The first real deployment was a ~4,200-line Python script running against an OANDA live micro-account with $297. Simple rules: EMA crossover with RSI confirmation, fixed percent risk, nothing fancy. It taught me more about spreads, slippage, and session liquidity in two months than any YouTube video.
V3 moved to an OANDA practice account with a sizeable demo balance so I could stress test features that need more room: six strategies, session filters, regime detection, per-pair blocks, ML entry filters, confluence scoring. The codebase grew to ~7,200 lines with a PyQt desktop GUI and rolling logs.
The backtest looked good. The live account didn't.
After 17 days of live trading, V3 had closed 45 trades with a profit factor of 0.54 — a losing system. I paused the bot and replayed every one of those trades in a simulator with the same candles, the same signals, but with a different exit: a wide stop, a break-even shield, and a fixed 2.75R take-profit instead of a tight trailing stop.
Same 45 entries. Different exits. PF 1.60. The edge was always there — the exit was giving it all back.
The fix is three phases:
Position sizing uses 1% flat risk calculated on the wide stop distance — so the dollar risk is constant even as the pip stop gets wider. Backtested the same way, simulated the same way, shipped live.
Rolling the config out wasn't one commit — it was fifteen. Each one was a subtle mismatch between what the simulator did and what the live execution actually did. Any single one of them was enough to turn a winning config into a losing one. A sample:
Legacy BE code fired at +3 pips and moved the stop, killing the shield at +10.
An old "take half off at 60% of TP" block was trimming every winner in half.
A favorable-excursion rule was swapping the fixed TP back to the old trailing one on winning trades.
A 10-pip cap on stops was silently narrowing the wide stop back to the old one on volatile pairs.
Kelly-VAPS, ML confidence, and streak multipliers compounded — on losing days, 1% risk became 0.47%.
Config override of {} didn't clear the default — so a stale pair-risk override kept leaking through.
On an $85k balance the 500,000-unit cap halved every trade's real risk to 0.5%. Would have halved expected return.
Pyramiding logic, tick-level proportional BE, two separate MFE-tracking paths, an unscoped local variable, a missing account fallback. The blog post walks through every one.
With all 15 bugs fixed and a dry-run passing on $85,733 balance (all 7 pairs sizing at exactly 1.00% risk), V3 went live Sunday evening. The 7-day tab on the Journal page is watching this run in real time.
V3's weakness isn't the strategy anymore — it's the Python GIL and the ~30ms latency between a price tick and a decision. V4 rebuilds the data plane in Rust: async OANDA streaming, lock-free indicator updates, sub-millisecond decide() latency, the same PyQt GUI reading shared state through JSON files and SQLite.
V4 runs in shadow mode first — every signal is logged as if
the bot had traded it, but no orders are actually submitted. This lets us validate
against live prices for a week before flipping execute_trades = true.
The page you came from (signals.html) used to advertise paid signal subscriptions on the strength of a backtest. The live numbers didn't back that up, so instead of selling signals, we're publishing the full trade log.
If the fix holds for 30 days, the headline number on that page will earn itself. If it doesn't, we'll tell you what we changed. Either way, every trade is visible.
Our backtest and our live bot ran different code. Same repo, same config file — different code paths. The only way to catch that is to replay real trades against the live engine, not just against a tidy simulator loop.
The same 45 entries produced a losing system with one exit and a winning one with another. We spent months tuning signal filters when the leverage was in the exit logic.
Dynamic risk scaling sounds responsible and reasonable. In practice it meant our losing-streak trades were half-sized by the time the winning trade showed up. A flat 1% is boring, and boring is profitable.
EUR/GBP lost 6 straight on our account. Rather than re-tune it, we blocked it — and the next winning trade somewhere else paid for the analysis. Blocking a pair is cheap; re-tuning is expensive.
I'm writing this up in more detail — one post per phase, with the real code diffs and the real numbers.