This startup’s new mechanistic interpretability tool lets you debug LLMs


Mapping models

Silico lets you zoom in on specific parts of a trained model, such as individual neurons or groups of neurons, and run experiments to see what those neurons do. (Assuming you have access to the model’s inner workings. Most people won’t be able to use Silico to poke around inside ChatGPT or Gemini, but you can use it to look at the parameters inside many open-source models.) You can then check what inputs make different neurons fire, and trace pathways upstream and downstream of a neuron to see how other neurons affect it and how it affects other neurons in turn.

For example, Goodfire found one neuron inside the open-source model Qwen 3 that was associated with the so-called trolley problem. Activating this neuron changed the model’s responses, making it frame its outputs as explicit moral dilemmas. “When this neuron’s active, all sorts of weird things happen,” says Ho.

Pinpointing the source of odd behavior like this is now pretty standard practice. But Goodfire wants to make it easier to adjust that behavior. Using Silico, developers can now adjust the parameters connected to individual neurons to boost or suppress certain behaviors.

In another example, Goodfire researchers asked a model whether a company should disclose that its AI behaves deceptively in 0.3% of cases, affecting 200 million users. The model said no, citing the negative business impact of such a disclosure.

By looking inside the model, the researchers found that boosting neurons that were found to be associated with transparency and disclosure flipped the answer from no to yes nine out of 10 times. “The model already had the ethical reasoning circuitry, but it was being outweighed by the commercial risk assessment,” says Ho.

Tweaking the values of a model in this way is just one approach. Silico can also help steer the training process by filtering out certain training data to avoid setting unwanted values for certain parameters in the first place.   

For example, many models will tell you that 9.11 is greater than 9.9. Looking inside a model to see what’s going on might reveal that it is being influenced by neurons associated with the Bible, in which verse 9.9 comes before 9.11, or by code repositories where consecutive updates are numbered 9.9, 9.10, 9.11 and so on. Using this information, the model can be retrained to make it avoid its “Bible” neurons when doing math.

By releasing Silico, Goodfire wants to put techniques previously available to a few top labs into the hands of smaller firms and research teams that want to build their own model or adapt an open-source one. The tool will be available for a fee determined on a case-by-case basis according to customers’ requirements (Goodfire declined to give specific pricing details).



Source link

  • Related Posts

    Exclusive eBook: Inside the stealthy startup that pitched brainless human clones

    The ultimate plan to live forever is a brand new body. This subscriber-only eBook explores R3 Bio, a small startup that has pitched a startling and ethically charged vision for…

    Samsung Chip Profits Soar Amid the Tech World’s RAM Shortages

    The world’s largest semiconductor-maker is reaping the rewards of a global memory chip shortage driven by AI demand so rapacious that it’s making everyone else miserable. Samsung said in its…

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You Missed

    This is the kind of righteous anger towards income…

    Trump signs order authorizing Bridger’s Canada-Wyoming crude pipeline

    Trump signs order authorizing Bridger’s Canada-Wyoming crude pipeline

    Trump to lift tariffs on scotch whisky after king’s US visit | Trump tariffs

    Trump to lift tariffs on scotch whisky after king’s US visit | Trump tariffs

    Exclusive eBook: Inside the stealthy startup that pitched brainless human clones

    Exclusive eBook: Inside the stealthy startup that pitched brainless human clones

    Why Southwest Airlines’ 53 New Redeye Flights All Depend On Killing Open Seating

    Why Southwest Airlines’ 53 New Redeye Flights All Depend On Killing Open Seating

    Mother’s Day Gift Ideas for Fashion and Beauty Lovers

    Mother’s Day Gift Ideas for Fashion and Beauty Lovers