Researchers isolate memorization from reasoning in AI neural networks


Looking ahead, if the information removal techniques receive further development in the future, AI companies could potentially one day remove, say, copyrighted content, private information, or harmful memorized text from a neural network without destroying the model’s ability to perform transformative tasks. However, since neural networks store information in distributed ways that are still not completely understood, for the time being, the researchers say their method “cannot guarantee complete elimination of sensitive information.” These are early steps in a new research direction for AI.

Traveling the neural landscape

To understand how researchers from Goodfire distinguished memorization from reasoning in these neural networks, it helps to know about a concept in AI called the “loss landscape.” The “loss landscape” is a way of visualizing how wrong or right an AI model’s predictions are as you adjust its internal settings (which are called “weights”).

Imagine you’re tuning a complex machine with millions of dials. The “loss” measures the number of mistakes the machine makes. High loss means many errors, low loss means few errors. The “landscape” is what you’d see if you could map out the error rate for every possible combination of dial settings.

During training, AI models essentially “roll downhill” in this landscape (gradient descent), adjusting their weights to find the valleys where they make the fewest mistakes. This process provides AI model outputs, like answers to questions.

Figure 1: Overview of our approach. We collect activations and gradients from a sample of training data (a), which allows us to approximate loss curvature w.r.t. a weight matrix using K-FAC (b). We decompose these weight matrices into components (each the same size as the matrix), ordered from high to low curvature. In language models, we show that data from different tasks interacts with parts of the spectrum of components differently (c).
Figure 1 from the paper “From Memorization to Reasoning in the Spectrum of Loss Curvature.”


Credit:

Merullo et al.

The researchers analyzed the “curvature” of the loss landscapes of particular AI language models, measuring how sensitive the model’s performance is to small changes in different neural network weights. Sharp peaks and valleys represent high curvature (where tiny changes cause big effects), while flat plains represent low curvature (where changes have minimal impact).

Using a technique called K-FAC (Kronecker-Factored Approximate Curvature), they found that individual memorized facts create sharp spikes in this landscape, but because each memorized item spikes in a different direction, when averaged together they create a flat profile. Meanwhile, reasoning abilities that many different inputs rely on maintain consistent moderate curves across the landscape, like rolling hills that remain roughly the same shape regardless of the direction from which you approach them.



Source link

  • Related Posts

    AG1 is a lot less science-y than it sounds

    This is Optimizer, a weekly newsletter sent every Friday from Verge senior reviewer Victoria Song that dissects and discusses the latest gizmos and potions that swear they’re going to change…

    Rocket Report: SpaceX probes upper stage malfunction; Starship testing resumes

    Welcome to Edition 8.28 of the Rocket Report! The big news in rocketry this week was that NASA still hasn’t solved the problem with hydrogen leaks on the Space Launch…

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You Missed

    Stellantis selling its stake in EV battery plant in Windsor, no job losses expected

    Stellantis selling its stake in EV battery plant in Windsor, no job losses expected

    Celebrate Lunar New Year With These Game Pass Options

    Celebrate Lunar New Year With These Game Pass Options

    Canada’s unemployment rate ticks down, economy loses 25,000 jobs in January

    Canada’s unemployment rate ticks down, economy loses 25,000 jobs in January

    AG1 is a lot less science-y than it sounds

    AG1 is a lot less science-y than it sounds

    Things In Air Traffic Control That You Just Can’t Unsee

    Things In Air Traffic Control That You Just Can’t Unsee

    Al Roker Meets NFL Legends, Bad Bunny Ahead of Super Bowl

    Al Roker Meets NFL Legends, Bad Bunny Ahead of Super Bowl