Tiny Curaçao, the smallest nation ever to qualify for a World Cup, clinched their maiden point on Saturday after battling to a 0-0 draw against Ecuador in Kansas City thanks to an outstanding performance from goalkeeper Eloy Room.
Israel seized Hezbollah underground command center in southern Lebanon
submitted by /u/Boston-Bets [link] [comments]
‘Humans, we have arrived!’ Brazilians receive alien invasion alerts
Thousands of Brazilians across multiple states were shocked by emergency alerts sent to their cell phones in the middle of the night
Read Full Article at RT.com
How NOT to Train an Offensive Security AI Agent
Last week I spent more time and money than I’m willing to admit trying to make a small AI model very good at CTFs.
Specifically, training it based on the benchmark I created – TarantuBench. That benchmark measures the offensive capabilities of artificial intelligence models using interactive cyber puzzles. Each such puzzle has a unique solution, so you can gauge whether the model succeeded or not through a direct check.
My thesis is the following – if the benchmark measures cyber capabilities, then perhaps it is possible to train a model based on it to perform such puzzles better.
The answer?
Maybe
Of course, I started the hard way. I set up a server in Google’s cloud where the model would try to solve these puzzles over time, and learn from its mistakes and successes. GRPO, for those wondering.
It didn’t work for an engineering reason – I wasn’t convinced that my implementation of this algorithm for the benchmark I built was correct.
I switched to a simpler method. I let the model run on the entire benchmark, took all its solutions, and tried to train it to continue solving in that way and not in another way that leads to errors. SFT of course.
Two problems:
First of all, the data I built wasn’t good. It took me (too) long to figure it out. I took the solutions as they were, without thinking too much about how I would re-feed them to the model so that it would really understand something from this data.
Then, I realized that I didn’t have enough data. I didn’t run the model enough times on the benchmark. At this point, between payments to Google’s cloud, for the model, and for Cursor, I decided that I would end my investment in the experiment.
The result is that every time I trained the model, it failed to exceed its original performance, and sometimes even deteriorated.
What did I learn?
Don’t train on solvers alone. Oracle scripts ≠ agent policy.
Don’t count solves without counting labs. 450 solves on 2 labs is not abundance.
Don’t distill a strong teacher into a weak student without student rollouts. Cross-model SFT is few-shot transfer.
Don’t expect fork rows to replace episodes. Prefix→decision pairs don’t teach horizon control.
Don’t augment your way out of n≈10. Grounding filters and replay repair are hygiene, not data.
Don’t split by run when labs repeat. Lab-disjoint or don’t report generalization.
Don’t chase chains before val singles lift. Composition needs components.
Don’t trust train loss. Track val solve rate and per-lab regressions against base.
Don’t skip the base arm. Every SFT eval should log base=SOLVED|FAIL per lab.
What does this mean?
That the experiment was unsuccessful – not that my thesis is wrong. I don’t plan to end this saga here, but I will take a short break and am sharing with you what *not* to do when you approach training models.
Stay tuned, I’ll try again soon.
Full experiment at tarantulabs.com
submitted by /u/dvnci1452
[link] [comments]
Iran says it is closing Strait of Hormuz, testing fragile agreement with U.S. – The Washington Post
Iran says it is closing Strait of Hormuz, testing fragile agreement with U.S. The Washington PostIran reportedly closes Strait of Hormuz again as Vance heads to Switzerland for talks CNBCIran says Strait of Hormuz is closed over c…
Supreme Court of Nepal rules in favour of marriage equality
submitted by /u/After-Professional-8 [link] [comments]
6/20: CBS Weekend News
Vice President JD Vance heads to Switzerland for peace talks with Iran; President Trump defends his beautification push in Washington, D.C.
Ukraine war live: Zelensky warns of ‘massive attack’ from Moscow – The Independent
Ukraine war live: Zelensky warns of ‘massive attack’ from Moscow The IndependentRussians ready for new large-scale attack, says Zelenskyy Українська правдаZelensky warns of new large-scale Russian attacks, urges precautions &…
Backstage at Gorillaz’ epic, one-off stadium show: ‘The vibe is ridiculous’
Damon Albarn, De La Soul and Moonchild Sannelly talk backstage as Gorillaz play their biggest show.
Texas Supreme Court denies request to save beach from SpaceX
Environmental groups say unanimous decision elevates the company’s interests over the rights of Lone Star state residents