macOS.Gaslight: North Korea-Linked Malware That Tries to Gaslight the Analyst

macOS.Gaslight: DPRK Rust implant for Mac with a prompt injection payload designed to fool AI-based malware analysts. SentinelLabs researchers spotted a Rust-based macOS implant, dubbed macOS.Gaslight, that surfaced in early June after an Apple XProtect update pointed to a VirusTotal sample uploaded on May 22. The binary was undetected by static engines at the time […]

June 26, 2026
Read More >>

Curl Fixes a 25-Year-Old Bug in Its Largest CVE Release Yet

Curl fixed 18 vulnerabilities, including a 25-year-old bug, with issues spanning auth bypass, memory safety, and host validation in libcurl. Curl maintainers addressed eighteen vulnerabilities with a single update, and one of them goes back 25 years. That’s not a typo, it really sat there since the early 2000s. curl is a widely used open-source […]

June 25, 2026
Read More >>

Inside Mistic, the New Stealth Backdoor in Ransomware Intrusions

Mistic is a stealthy backdoor used by KongTuke-linked actors to keep long-term access in ransomware-targeted networks. Mistic is the kind of backdoor that tells you the operator wants time, not noise. Symantec security researchers say it has shown up in financially motivated attacks against insurance, education, IT, and professional services firms, and they link it […]

June 25, 2026
Read More >>

Cisco Catalyst SD-WAN Zero-Day CVE-2026-20245 Exploited Months Before Disclosure

Hackers exploited Cisco Catalyst SD-WAN flaw CVE-2026-20245 as a zero-day months before disclosure, enabling privileged command execution. Google-owned Mandiant reported that an unknown threat actor exploited Cisco Catalyst SD-WAN vulnerability CVE-2026-20245 (CVSS base score of 7.8) as a zero-day at least two months before it was publicly disclosed. The flaw allows an authenticated attacker with […]

June 25, 2026
Read More >>

Nathan Austad Pleads Guilty in DraftKings Hacking Scheme, Gets 18 Months

Third DraftKings hacker gets 18 months in prison for a 2022 credential-stuffing attack that compromised 1,600 accounts and stole $600,000. Nathan Austad, the third person sentenced over the 2022 DraftKings credential-stuffing attack, received 18 months in prison. The group used usernames and passwords stolen from other breaches to access about 1,600 accounts and steal roughly […]

June 25, 2026
Read More >>

Supervised Reinforcement Learning for LLMs on CTF Labs

Follwing up on my recent post [how NOT to train an offensive ai model], I continued doing this experiment to see what more there is to learn about this process.

Tl;dr:

Using data derived from real solutions for interactive CTF labs as training data for LLMs produce surprisingly different results depending on the training data. As this is an interactive process, fully logged and transparent, one can learn a lot about the different failure modes that arise from different forms of the training data. More, elaborated below.

After building what I believe is the best training data I could for this task, as derived from my own benchmark, and running an evaluation of the SFT model (Gemma*, distinct from Gemma base), it appears to be more reliable and successful in solving most single-vuln labs (maxing out some of them, which impacted precise measurement), solved more chain-vuln labs, in fewer steps, and being more deterministic in its solutions.

The method of evaluation here is a standard split/val/train of all the labs I currently have.

Multiple attempts have been made to validate this behavior outside of my own benchmark, in an attempt to replicate this in 3rd party environment as well.

I could not do so reliably and at-scale – so take these results with a grain of salt.

There are multiple ways to improve a model in an interactive learning environment. The leading methods are:

  1. Using a teacher – a larger model whom the smaller one will imitate.

  2. Self-play – the model solves the tasks, and learns from its own solutions

  3. Imitation of human solutions.

I chose neither.

My goal was to build a framework that will, for any given model M, produce a model M*, which is better at web exploitation.

Neither of the methods above provide that solution.

My approach was to use the actual solutions I have for the labs. The advantage for this approach is that one is adding more information to the system that is directly derived from a truth source about the environment it’s attempting to solve. The disadvantage is, that truth is often not behaviorally aligned with how a human or AI interacts with the app.

The solution for this problem, in short, is to take that source of truth and transform it into something that more closely resemble how an actual exploitation looks.

Finding this solution required iterating over how exactly I think this transformation should look. This iteration showed interesting behavior along the way.

Essentially, given the right training data, one could tune a knob and make the model more recon-heavy, payload-focused, or, of course, generically worse than the base model.

I’ve divided this behavior internally into a few buckets, which helped me during this process.

After I settled on what I think is the most balanced and representative dataset of live, interactive, web exploitation – I kicked off doing supervised fine-tuning for the model.

I then evaluated the new model, Gemma* against Gemma base, on many thousands of runs through the val and test splits.

The results are largely positive. On the sub-set of the labs which actually measure generalization, and not memorization, Gemma* consistently beats Gemma. So much so, that my evaluation data is skewed because for labs that Gemma has scored ~80% on, Gemma* consistently got 100%. This skews the results because the improvement could be more than +20pp, but I could not see it under this circumstance.

They’re also positive compared to scale – 64 training labs total. Generally, in attempts to fine-tune AI models of this type, the number I used is 2-3 orders of magnitude smaller than normally accepted.

Which raises my next point about data scarcity.

There is no public, open-source, audit of full-trace to solve CTFs. Unlike coding and other agentic tasks, where there’s a lot of data out there, this format of data is scarce. Specifically, what is scarce is a known, correct, deterministic solution trace for a given CTF.

On principle, I could have automatically built thousands of additional labs – it would have taken me a day – but that wasn’t quite what I was looking to do.

Bottom line:

It appears that, thanks to this data I’ve collected, I was able to get a net positive result on this training run. If I do decide to push up the scale, and perhaps invest more money and train a model larger than Gemma, I could possibly detect some additional improvements that were out-of-scope of the scale of this experiment.

More specifically, this access to correct and grounded results of CTFs proved valuable in this training, in a way that I think simple write-ups for known exploits would not have been.

I used the TarantuBench benchmark in this research, and all interactive labs are available on tarantulabs.com

submitted by /u/dvnci1452
[link] [comments]

June 25, 2026
Read More >>

Europol Disrupts StealC and Amadey Malware Infrastructure in Operation Endgame

Operation Endgame disrupted malware services like StealC and Amadey that enable ransomware, fraud, and attacks on critical infrastructure. Between June 15 and 19, 2026, Europol coordinated a two-week law enforcement operation involving agencies from Canada, Denmark, Germany, the Netherlands, the UK, and the US, alongside private firms like Microsoft, Bitdefender, IBM X-Force, Proofpoint, Infoblox, Shadowserver, […]

June 24, 2026
Read More >>

Why Frontier AI makes prioritization the most important part of your CTEM program

Frontier AI could drive a 10x surge in vulnerabilities. CTEM helps organizations continuously identify, prioritize, and reduce real cyber risk. Your vulnerability management program was not designed for what is coming next. More than 40,000 CVEs were reported in 2025, breaking yet another record. Today, security experts anticipate that frontier AI-powered systems could drive a […]

June 24, 2026
Read More >>