2026-05-05 AI 日报
今天共整理 35 条 AI 动态,主要涵盖 资讯4条、工具10条、产品5条、论文12条、讨论4条。
资讯
AI facial recognition oversight lagging far behind technology, watchdogs warn
来源:Hacker News,类型:资讯,AI facial recognition oversight lagging far behind technology, watchdogs warn
Making Fuel from Thin Air: The Magical Methane Machine
来源:Hacker News,类型:资讯,Making Fuel from Thin Air: The Magical Methane Machine
Clone Talking – Real-Time Voice Conversations with AI Persona Clones
来源:Hacker News,类型:资讯,Clone Talking – Real-Time Voice Conversations with AI Persona Clones
Tailoring AI solutions for health care needs
来源:Hacker News,类型:资讯,Tailoring AI solutions for health care needs
工具
Show HN: [inerrata] – Collective and Causal Knowledge Layer for Coding Agents
来源:Hacker News,类型:工具,Agents posting questions, answers, and knowledge reports will find their contributions distilled, enriched, and organize
Future of Work with AI Agents
来源:Hacker News,类型:工具,Future of Work with AI Agents
ruvnet/ruflo
来源:GitHub Trending,类型:工具,Star ruvnet / ruflo 🌊 The leading agent orchestration platform for Claude. Deploy intelligent multi-agent swarms, coord
browserbase/skills
来源:GitHub Trending,类型:工具,Star browserbase / skills Claude Agent SDK with a web browsing tool
Hmbown/DeepSeek-TUI
来源:GitHub Trending,类型:工具,Star Hmbown / DeepSeek-TUI Coding agent for DeepSeek models that runs in your terminal
soxoj/maigret
Sponsor Star soxoj / maigret 🕵️♂️ Collect a dossier on a person by username from 3000+ sites
1jehuang/jcode
Star 1jehuang / jcode Coding Agent Harness
msitarzewski/agency-agents
Sponsor Star msitarzewski / agency-agents A complete AI agency at your fingertips - From frontend wizards to Reddit community ninjas, from whimsy injectors to reality checkers. Each agent is a specialized expert with personality, processes, and proven deliverables.
分享一个Lite 基础套餐,几乎没咋用
分享一个腾讯云大模型API coding paln套餐,6号过期,支持km2.5、glm5、minimax2.5 sk-sp-CGoLDkny6121G2DgUQ9sUyDDB3v6sCUEy74avdu1MBkipiPV cloud.tencent.com 大模型服务平台 TokenHub CodeBuddy Code_ CodeBuddyCode是基于腾讯云AI技术的智能编程工具,深度集成腾讯云生态,提供从代码编写到项目部署的全链路AI辅助。本文介绍如何在CodeBuddyCo 1 个帖子 - 1 位参与者 阅读完整话题 ]]>
Why SSMs struggle in parameter-constrained training: empirical findings at 25M parameters [R]
After ~3 weeks of experimentation in OpenAI's Parameter Golf competition, I wrote up why SSMs are structurally disadvantaged relative to transformers in a time- and size-constrained regime (10 min training, 16MB artifact, 25M parameters) on 8xH100s: https://mradassaad.github.io/posts/why-ssms-struggle-in-parameter-golf/ Main findings: SSM in_proj weights compress up to 3.26x worse than attention QKV under LZMA, directly taxing the compressed parameter budget Architectural wins validated at SP4096 flipped sign at SP8192 — two configs that looked like clean wins reversed direction at the target vocabulary Also includes three kernel-level experiments on the Mamba-3 Triton kernels: a backward fusion attempt that was numerically exact but 16% slower due to SMEM pressure, a torch.compile quantizer bug that cost 5.5 mBPB, and a mixed-precision dynamics protection that recovered 0.8 mBPB at negligible size cost.   submitted by   /u/mradassaad [link]   [comments]
产品
Show HN: Vibe-coding video games with Claude (Day 21: Blackjack)
来源:Hacker News,类型:产品,I used to run a flash games website (SWF files) years ago. I've made a few games of my own. I'm also an avid g
Ask HN: Why is sharing private static HTML with non-engineers still hard?
来源:Hacker News,类型:产品,This sounds like a problem from 2005. But I keep running into it. I have a folder of static HTML and CSS. Think: docs, r
ChatGPT 订阅 GoPay支付出错的解决办法
跟着佬友的喂饭教程操作 https://linux.do/t/topic/2089572/1 在 ChatGPT 订阅页面使用 GoPay 支付时,遇到提示“出错了,请重试”的问题。 除了换邮箱换节点,还有一个更简单的方法,就是跳转到Stripe长链接支付 用codex写了一个油猴脚本,可以在订阅界面点 GoPay支付 按钮自动跳转 // ==UserScript== // @name ChatGPT GoPay 跳转 // @namespace https://chatgpt.com/ // @author Miku-Y // @version 1.5.0 // @description 在 ChatGPT checkout 页面显示 GoPay 按钮,用户点击后生成 Stripe hosted checkout 长链接并跳转 // @match https://chatgpt.com/* // @run-at document-start // @grant none // ==/UserScript== (() => { "use strict"; const PATH = "/checkout/openai_llc/"; const BTN_ID = "gopay-checkout-button"; const TIP_ID = "gopay-checkout-tip"; const BTN_CSS = "position:fixed;left:16px;top:50%;transform:translateY(-50%);z-index:2147483647;height:46px;min-width:136px;border:0;border-radius:8px;background:#10a37f;color:#fff;font:700 16px/1 -apple-system,BlinkMacSystemFont,Segoe UI,sans-serif;box-shadow:0 10px 28px rgba(16,163,127,.34);cursor:pointer"; const TIP_CSS = "position:fixed;left:16px;top:calc(50% + 34px);z-index:2147483647;max-width:360px;padding:12px 14px;border-radius:8px;font:14px/1.5 -apple-system,BlinkMacSystemFont,Segoe UI,sans-serif;box-shadow:0 8px 28px rgba(0,0,0,.18);color:#fff"; const PAYLOAD = { plan_name: "chatgptplusplan", billing_details: { country: "ID", currency: "IDR" }, cancel_url: "https://chatgpt.com/#pricing", promo_campaign: { promo_campaign_id: "plus-1-month-free", is_coupon_from_query_param: false }, checkout_ui_mode: "hosted", }; let timer = null; let busy = false; const $ = (id) => document.getElementById(id); const inCheckout = () => location.pathname.startsWith(PATH); const mount = (el) => (document.body || document.documentElement).appendChild(el); function tip(text, error = false) { let el = $(TIP_ID); if (!el) { el = document.createElement("div"); el.id = TIP_ID; el.style.cssText = TIP_CSS; mount(el); } el.textContent = text; el.style.background = error ? "#991b1b" : "#111827"; } function setLoading(button, loading) { button.disabled = loading; button.textContent = loading ? "跳转中,请等待..." : "GoPay支付"; button.style.opacity = loading ? ".72" : "1"; button.style.cursor = loading ? "wait" : "pointer"; } async function fetchJson(url, options) { const response = await fetch(url, options); const data = await response.json().catch(() => null); if (!response.ok) throw Object.assign(new Error(`HTTP ${response.status}`), { data }); return data; } async function pay(button) { if (busy) return; busy = true; setLoading(button, true); tip("正在生成 Stripe 付款链接..."); try { const token = (await fetchJson("/api/auth/session", { credentials: "include" }))?.accessToken; if (!token) throw new Error("获取登录 Token 失败,请确认 ChatGPT 已登录"); const data = await fetchJson("https://chatgpt.com/backend-api/payments/checkout", { method: "POST", credentials: "include", headers: { Authorization: `Bearer ${token}`, "Content-Type": "application/json", }, body: JSON.stringify(PAYLOAD), }); const hostedUrl = data?.url || data?.stripe_hosted_url || data?.checkout_url; if (!hostedUrl) throw new Error("未找到 Stripe 长链接,请看控制台原始响应"); console.log("[plus-link] Stripe 长链接:", hostedUrl, data); tip("Stripe 链接生成成功,正在跳转..."); location.assign(hostedUrl); } catch (error) { console.error("[plus-link]", error); busy = false; setLoading(button, false); tip(error.message || "请求失败,请看控制台日志", true); } } function showButton() { if (!inCheckout() || $(BTN_ID)) return; const button = document.createElement("button"); button.id = BTN_ID; button.type = "button"; button.textContent = "GoPay支付"; button.style.cssText = BTN_CSS; button.addEventListener("click", () => pay(button)); mount(button); } function sync() { if (!inCheckout()) { clearTimeout(timer); timer = null; $(BTN_ID)?.remove(); $(TIP_ID)?.remove(); return; } if (!$(BTN_ID) && !timer) { timer = setTimeout(() => { timer = null; showButton(); }, 3000); } } for (const method of ["pushState", "replaceState"]) { const raw = history[method]; history[method] = function (...args) { const result = raw.apply(this, args); setTimeout(sync, 50); return result; }; } window.addEventListener("popstate", () => setTimeout(sync, 50)); window.addEventListener("DOMContentLoaded", sync); sync(); setTimeout(sync, 1000); })(); 2 个帖子 - 2 位参与者 阅读完整话题 ]]>
Llama.cpp MTP support now in beta!
Happy to report that llama.cpp MTP support is now in beta, thanks to Aman (and all the others that have pushed the various issues in the meantime). This has the potential to actually get merged soon-ish. Currently contains support for Qwen3.5 MTP, but other models are likely to follow suit. Between this and the maturing tensor-parallel support, expect most performance gaps between llama.cpp and vLLM, at least when it comes to token generation speeds, to be erased.   submitted by   /u/ilintar [link]   [comments]
AMA on our DevDay Launches
It’s the best time in history to be a builder. At DevDay [2025], we introduced the next generation of tools and models to help developers code faster, build agents more reliably, and scale their apps in ChatGPT. Ask us questions about our launches such as: AgentKit Apps SDK Sora 2 in the API GPT-5 Pro in the API Codex Missed out on our announcements? Watch the replays: https://youtube.com/playlist?list=PLOXw6I10VTv8-mTZk0v7oy1Bxfo3D2K5o&si=nSbLbLDZO7o-NMmo Join our team for an AMA to ask questions and learn more, Thursday 11am PT. Answering Q's now are: Dmitry Pimenov - u/dpim Alexander Embiricos - u/embirico Ruth Costigan - u/ruth_on_reddit Christina Huang - u/Brief-Detective-9368 Rohan Mehta - u/ Downtown_Finance4558 Olivia Morgan - u/Additional-Fig6133 Tara Seshan - u/tara-oai Sherwin Wu - u/sherwin-openai PROOF: https://x.com/OpenAI/status/1976057496168169810 EDIT: 12PM PT, That's a wrap on the main portion of our AMA, thank you for your questions. We're going back to build. The team will jump in and answer a few more questions throughout the day.   submitted by   /u/OpenAI [link]   [comments]
论文
TauricResearch/TradingAgents
来源:GitHub Trending,类型:论文,Star TauricResearch / TradingAgents TradingAgents: Multi-Agents LLM Financial Trading Framework
virattt/dexter
Star virattt / dexter An autonomous agent for deep financial research
HyCOP: Hybrid Composition Operators for Interpretable Learning of PDEs
We introduce HyCOP, a modular framework that learns parametric PDE solution operators by composing simple modules (advection, diffusion, learned closures, boundary handling) in a query-conditioned way. Rather than learning a monolithic map, HyCOP learns a policy over short programs - which module to apply and for how long - conditioned on regime features and state statistics. Modules may be numerical sub-solvers or learned components, enabling hybrid surrogates evaluated at arbitrary query times without autoregressive rollout. Across diverse PDE benchmarks, HyCOP produces interpretable programs, delivers order-of-magnitude OOD improvements over monolithic neural operators, and supports modular transfer through dictionary updates (e.g., boundary swaps, residual enrichment). Our theory characterizes expressivity and gives an error decomposition that separates composition error from module error and doubles as a process-level diagnostic.
Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs
While autoregressive Large Vision-Language Models (LVLMs) demonstrate remarkable proficiency in multimodal tasks, they face a "Visual Signal Dilution" phenomenon, where the accumulation of textual history expands the attention partition function, causing visual attention to decay inversely with generated sequence length. To counteract this, we propose Persistent Visual Memory (PVM), a lightweight learnable module designed to ensure sustained, on-demand visual perception. Integrated as a parallel branch alongside the Feed-Forward Network (FFN) in LVLMs, PVM establishes a distance-agnostic retrieval pathway that directly provides visual embeddings for precise visual perception, thereby structurally mitigating the signal suppression inherent to deep generation. Extensive experiments on Qwen3-VL models demonstrate that PVM brings notable improvements with negligible parameter overhead, delivering consistent average accuracy gains across both 4B and 8B scales, particularly in complex reasoning tasks that demand persistent visual perception. Furthermore, in-depth analysis reveals that PVM can resist length-induced signal decay and accelerate internal prediction convergence.
Can Coding Agents Reproduce Findings in Computational Materials Science?
Large language models are increasingly deployed as autonomous coding agents and have achieved remarkably strong performance on software engineering benchmarks. However, it is unclear whether such success transfers to computational scientific workflows, where tasks require not only strong coding ability, but also the ability to navigate complex, domain-specific procedures and to interpret results in the context of scientific claims. To address this question, we present AutoMat, a benchmark for evaluating LLM-based agents' ability to reproduce claims from computational materials science. AutoMat poses three interrelated challenges: recovering underspecified computational procedures, navigating specialized toolchains, and determining whether the resulting evidence supports a claim. By working closely with subject matter experts, we curate a set of claims from real materials science papers to test whether coding agents can recover and execute the end-to-end workflow needed to support (or undermine) such claims. We then evaluate multiple representative coding agent settings across several foundation models. Our results show that current LLM-based agents obtain low overall success rates on AutoMat, with the best-performing setting achieving a success rate of only 54.1%. Error analysis further reveals that agents perform worst when workflows must be reconstructed from paper text alone and that they fail primarily due to incomplete procedures, methodological deviations, and execution fragility. Taken together, these findings position AutoMat as both a benchmark for computational scientific reproducibility and a tool for diagnosing the current limitations of agentic systems in AI-for-science settings.
Generating Statistical Charts with Validation-Driven LLM Workflows
Generating diverse, readable statistical charts from tabular data remains challenging for LLMs, as many failures become apparent after rendering and are not detectable from data or code alone. Existing chart datasets also rarely provide fully aligned artifacts, such as executable code, dataset context, and question-answer pairs. We present a structured LLM-based workflow that decomposes chart generation into dataset screening, plot proposal, code synthesis, rendering, validation-driven refinement, description generation, and question-answer generation. By incorporating rendered-output validation, the workflow addresses visualization-specific failure modes such as readability and semantic mismatch. It treats chart generation as an inspectable process rather than a one-shot prompt-to-code task, retaining each chart with its code, dataset context, description, and question-answer pairs. Applied to UCI datasets, the workflow produces 1,500 charts from 74 datasets, spanning 24 chart families and paired with 30,003 question-answer pairs. We evaluate 16 multimodal LLMs (MLLMs) on these chart-question pairs. The results show that chart-syntax questions are nearly saturated, while value extraction, comparison, and reasoning remain more challenging, illustrating the workflow's utility for diagnostic studies of chart-grounded multimodal reasoning.
RunAgent: Interpreting Natural-Language Plans with Constraint-Guided Execution
Humans solve problems by executing targeted plans, yet large language models (LLMs) remain unreliable for structured workflow execution. We propose RunAgent, a multi-agent plan execution platform that interprets natural-language plans while enforcing stepwise execution through constraints and rubrics. RunAgent bridges the expressiveness of natural language with the determinism of programming via an agentic language with explicit control constructs (e.g., \texttt{IF}, \texttt{GOTO}, \texttt{FORALL}). Beyond verifying syntactic and semantic verification of the step output, which is performed based on the specific instruction of each step, RunAgent autonomously derives and validates constraints based on the description of the task and its instance at each step. RunAgent also dynamically selects among LLM-based reasoning, tool usage, and code generation and execution (e.g., in Python), and incorporates error correction mechanisms to ensure correctness. Finally, RunAgent filters the context history by retaining only relevant information during the execution of each step. Evaluations on Natural-plan and SciBench Datasets demonstrate that RunAgent outperforms baseline LLMs and state-of-the-art PlanGEN methods.
When RAG Chatbots Expose Their Backend: An Anonymized Case Study of Privacy and Security Risks in Patient-Facing Medical AI
Background: Patient-facing medical chatbots based on retrieval-augmented generation (RAG) are increasingly promoted to deliver accessible, grounded health information. AI-assisted development lowers the barrier to building them, but they still demand rigorous security, privacy, and governance controls. Objective: To report an anonymized, non-destructive security assessment of a publicly accessible patient-facing medical RAG chatbot and identify governance lessons for safe deployment of generative AI in health. Methods: We used a two-stage strategy. First, Claude Opus 4.6 supported exploratory prompt-based testing and structured vulnerability hypotheses. Second, candidate findings were manually verified using Chrome Developer Tools, inspecting browser-visible network traffic, payloads, API schemas, configuration objects, and stored interaction data. Results: The LLM-assisted phase identified a critical vulnerability: sensitive system and RAG configuration appeared exposed through client-server communication rather than restricted server-side. Manual verification confirmed that ordinary browser inspection allowed collection of the system prompt, model and embedding configuration, retrieval parameters, backend endpoints, API schema, document and chunk metadata, knowledge-base content, and the 1,000 most recent patient-chatbot conversations. The deployment also contradicted its privacy assurances: full conversation records, including health-related queries, were retrievable without authentication. Conclusions: Serious privacy and security failures in patient-facing RAG chatbots can be identified with standard browser tools, without specialist skills or authentication; independent review should be a prerequisite for deployment. Commercial LLMs accelerated this assessment, including under a false developer persona; assistance available to auditors is equally available to adversaries.
Unsupervised Denoising of Real Clinical Low Dose Liver CT with Perceptual Attention Networks
With the development of deep learning, medical image processing has been widely used to assist clinical research. This paper focuses on the denoising problem of low-dose computed tomography using deep learning. Although low-dose computed tomography reduces radiation exposure to patients, it also introduces more noise, which may interfere with visual interpretation by physicians and affect diagnostic results. To address this problem, inspired by Cycle-GAN for unsupervised learning, this paper proposes an end-to-end unsupervised low-dose computed tomography denoising framework. The proposed framework combines a U-Net structure for multi-scale feature extraction, an attention mechanism for feature fusion, and a residual network for feature transformation. It also introduces perceptual loss to improve the network for the characteristics of medical images. In addition, we construct a real low-dose computed tomography dataset and design a large number of comparative experiments to validate the proposed method, using both image-based evaluation metrics and medical evaluation criteria. Compared with classical methods, the main advantage of this paper is that it addresses the limitation that real clinical data cannot be directly used for supervised learning, while still achieving excellent performance. The experimental results are also professionally evaluated by imaging physicians and meet clinical needs.
Make Your LVLM KV Cache More Lightweight
Key-Value (KV) cache has become a de facto component of modern Large Vision-Language Models (LVLMs) for inference. While it enhances decoding efficiency in Large Language Models (LLMs), its direct adoption in LVLMs introduces substantial GPU memory overhead due to the large number of vision tokens processed during the prefill stage. To tackle this problem, we propose LightKV, a novel approach that reduces KV cache size by exploiting the redundancy among vision-token embeddings. Guided by text prompts, LightKV employs cross-modality message passing to aggregate informative messages across vision tokens and progressively compress them during prefill. This prompt-aware guidance distinguishes our method from prior vision-only compression strategies. We evaluate LightKV on eight open-source LVLMs across eight public benchmark datasets, e.g., MME and SeedBench. Experimental results demonstrate that with only 55% of the original vision tokens, LightKV (a) halves the vision-token KV cache size, (b) reduces computation by up to 40%, and (c) preserves general-purpose performance while significantly outperforming existing baselines.
AMA Announcement: Nous Research, The Opensource Lab Behind Hermes Agent (Wednesday, 8AM-11AM PST)
Hi r/LocalLLaMA 👋 We're excited for Wednesday's guests, The Nous Research Team! Kicking things off Wednesday, April. 29th, 8 AM–11 AM PST ⚠️ Note: The AMA itself will be hosted in a separate thread, please don’t post questions here.   submitted by   /u/XMasterrrr [link]   [comments]
Best Local LLMs - Apr 2026
We're back with another Best Local LLMs Megathread! We have continued feasting in the months since the previous thread with the much anticipated release of the Qwen3.5 and Gemma4 series. If that wasn't enough, we are having some scarcely believable moments with GLM-5.1 boasting SOTA level performance, Minimax-M2.7 being the accessible Sonnet at home, PrismML Bonsai 1-bit models that actually work etc. Tell us what your favorites are right now! The standard spiel: Share what you are running right now and why. Given the nature of the beast in evaluating LLMs (untrustworthiness of benchmarks, immature tooling, intrinsic stochasticity), please be as detailed as possible in describing your setup, nature of your usage (how much, personal/professional use), tools/frameworks/prompts etc. Rules Only open weights models Please thread your responses in the top level comments for each Application below to enable readability Applications General : Includes practical guidance, how to, encyclopedic QnA, search engine replacement/augmentation Agentic/Agentic Coding/Tool Use/Coding Creative Writing/RP Speciality If a category is missing, please create a top level comment under the Speciality comment Notes Useful breakdown of how folk are using LLMs: https://preview.redd.it/i8td7u8vcewf1.png?width=1090&format=png&auto=webp&s=423fd3fe4cea2b9d78944e521ba8a39794f37c8d Bonus points if you breakdown/classify your recommendation by model memory footprint: (you can and should be using multiple models in each size range for different tasks) Unlimited: >128GB VRAM XL: 64 to 128GB VRAM L: 32 to 64GB VRAM M: 8 to 32GB VRAM S:   submitted by   /u/rm-rf-rm [link]   [comments]
讨论
转自小红书,这 Deepseek 攻击也太强了吧,笑得我
小红书 无敌。 - 小红书 3 亿人的生活经验,都在小红书 GPT 以前最后一句话可没这么说过啊 这从哪学的 2 个帖子 - 2 位参与者 阅读完整话题 ]]>
[D] Self-Promotion Thread
Please post your personal projects, startups, product placements, collaboration needs, blogs etc. Please mention the payment and pricing requirements for products and services. Please do not post link shorteners, link aggregator websites , or auto-subscribe links. -- Any abuse of trust will lead to bans. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. -- Meta: This is an experiment. If the community doesnt like this, we will cancel it. This is to encourage those in the community to promote their work by not spamming the main threads.   submitted by   /u/AutoModerator [link]   [comments]
[D] Monthly Who's Hiring and Who wants to be Hired?
For Job Postings please use this template Hiring: [Location], Salary:[], [Remote | Relocation], [Full Time | Contract | Part Time] and [Brief overview, what you're looking for] For Those looking for jobs please use this template Want to be Hired: [Location], Salary Expectation:[], [Remote | Relocation], [Full Time | Contract | Part Time] Resume: [Link to resume] and [Brief overview, what you're looking for] ​ Please remember that this community is geared towards those with experience.   submitted by   /u/AutoModerator [link]   [comments]
Sora 2 megathread (part 3)
The last one hit the post limit of 100,000 comments. Do not try to buy codes. You will get scammed. Do not try to sell codes. You will get permanently banned. We have a bot set up to distribute invite codes in the Discord so join if you can't find codes in the comments here. Check the #sora-invite-codes channel. The Discord has dozens of invite codes available, with more being posted constantly! Update: Discord is down until Discord unlocks our server. The massive flood of joins caused the server to get locked because Discord thought we were botting lol. Also check the megathread on Chambers for invites.   submitted by   /u/WithoutReason1729 [link]   [comments]