submitted by /u/National-Charity-435 [link] [comments]
Trump blames ‘vandals’ for Reflecting Pool damage as former Olympian arrested
US President Donald Trump on Saturday announced that federal authorities had made “multiple arrests” of people he said were vandalising the Reflecting Pool
A look at Jane Street’s push to supercharge trading with AI and become a major AI investor; it invested $1B in CoreWeave in April and has a stake in Anthropic (Gregory Zuckerman/Wall Street Journal)
Gregory Zuckerman / Wall Street Journal:
A look at Jane Street’s push to supercharge trading with AI and become a major AI investor; it invested $1B in CoreWeave in April and has a stake in Anthropic — The firm has surged from a han…
Prohibition hits European summer: France bans booze at music festival as heatwave hits 41°C
World News, Today World News, Latest International News, World Breaking News, Trending News of World – Times of India World News, Today World News, Latest International News, World Breaking News, Trending News of World – Times of India https://timesofindia.indiatimes.com/world GlobalNewsBot GlobalNewsBot
‘250-ft gash, poured chemicals’: Trump claims vandals damaged Reflecting Pool
World News, Today World News, Latest International News, World Breaking News, Trending News of World – Times of India World News, Today World News, Latest International News, World Breaking News, Trending News of World – Times of India https://timesofindia.indiatimes.com/world GlobalNewsBot GlobalNewsBot
World Cup debutants Curaçao clinch historic first point after holding Ecuador to draw
Tiny Curaçao, the smallest nation ever to qualify for a World Cup, clinched their maiden point on Saturday after battling to a 0-0 draw against Ecuador in Kansas City thanks to an outstanding performance from goalkeeper Eloy Room.
King Charles III to reveal personal tax bill in historic first for UK monarch
Britain’s King Charles III will become the first monarch to share his personal tax information in a bid to improve transparency, Buckingham Palace told UK media on Saturday. British royalty has no obligation to disclose their tax bills, but recent sca…
Israel seized Hezbollah underground command center in southern Lebanon
submitted by /u/Boston-Bets [link] [comments]
‘Humans, we have arrived!’ Brazilians receive alien invasion alerts
Thousands of Brazilians across multiple states were shocked by emergency alerts sent to their cell phones in the middle of the night
Read Full Article at RT.com
How NOT to Train an Offensive Security AI Agent
Last week I spent more time and money than I’m willing to admit trying to make a small AI model very good at CTFs.
Specifically, training it based on the benchmark I created – TarantuBench. That benchmark measures the offensive capabilities of artificial intelligence models using interactive cyber puzzles. Each such puzzle has a unique solution, so you can gauge whether the model succeeded or not through a direct check.
My thesis is the following – if the benchmark measures cyber capabilities, then perhaps it is possible to train a model based on it to perform such puzzles better.
The answer?
Maybe
Of course, I started the hard way. I set up a server in Google’s cloud where the model would try to solve these puzzles over time, and learn from its mistakes and successes. GRPO, for those wondering.
It didn’t work for an engineering reason – I wasn’t convinced that my implementation of this algorithm for the benchmark I built was correct.
I switched to a simpler method. I let the model run on the entire benchmark, took all its solutions, and tried to train it to continue solving in that way and not in another way that leads to errors. SFT of course.
Two problems:
First of all, the data I built wasn’t good. It took me (too) long to figure it out. I took the solutions as they were, without thinking too much about how I would re-feed them to the model so that it would really understand something from this data.
Then, I realized that I didn’t have enough data. I didn’t run the model enough times on the benchmark. At this point, between payments to Google’s cloud, for the model, and for Cursor, I decided that I would end my investment in the experiment.
The result is that every time I trained the model, it failed to exceed its original performance, and sometimes even deteriorated.
What did I learn?
Don’t train on solvers alone. Oracle scripts ≠ agent policy.
Don’t count solves without counting labs. 450 solves on 2 labs is not abundance.
Don’t distill a strong teacher into a weak student without student rollouts. Cross-model SFT is few-shot transfer.
Don’t expect fork rows to replace episodes. Prefix→decision pairs don’t teach horizon control.
Don’t augment your way out of n≈10. Grounding filters and replay repair are hygiene, not data.
Don’t split by run when labs repeat. Lab-disjoint or don’t report generalization.
Don’t chase chains before val singles lift. Composition needs components.
Don’t trust train loss. Track val solve rate and per-lab regressions against base.
Don’t skip the base arm. Every SFT eval should log base=SOLVED|FAIL per lab.
What does this mean?
That the experiment was unsuccessful – not that my thesis is wrong. I don’t plan to end this saga here, but I will take a short break and am sharing with you what *not* to do when you approach training models.
Stay tuned, I’ll try again soon.
Full experiment at tarantulabs.com
submitted by /u/dvnci1452
[link] [comments]