Get the latest tech news

I unified convolution and attention into a single framework

The operational primitives of deep learning, primarily matrix multiplication and convolution, existas a fragmented landscape of highly specialized tools. This paper introduces the Generalized WindowedOperation (GWO), a theoretical framework that unifies these operations by decomposing them into threeorthogonal components: Path, defining operational locality; Shape, defining geometric structure andunderlying symmetry assumptions; and Weight, defining feature importance.We elevate this framework to a predictive theory grounded in two fundamental principles. First, weintroduce the Principle of Structural Alignment, which posits that optimal generalization is achievedwhen the GWO’s (P, S, W) configuration mirrors the data’s intrinsic structure. Second, we show thatthis principle is a direct consequence of the Information Bottleneck (IB) principle. To formalizethis, we define an Operational Complexity metric based on Kolmogorov complexity. However, wemove beyond the simplistic view that lower complexity is always better. We argue that the nature ofthis complexity—whether it contributes to brute-force capacity or to adaptive regularization—isthe true determinant of generalization. Our theory predicts that a GWO whose complexity is utilized toadaptively align with data structure will achieve a superior generalization bound. Canonical operationsand their modern variants emerge as optimal solutions to the IB objective, and our experiments reveal thatthe quality, not just the quantity, of an operation’s complexity governs its performance. The GWO theorythus provides a grammar for creating neural operations and a principled pathway from data propertiesto generalizable architecture design.

The operational primitives of deep learning, primarily matrix multiplication and convolution, existas a fragmented landscape of highly specialized tools. Our theory predicts that a GWO whose complexity is utilized toadaptively align with data structure will achieve a superior generalization bound. Canonical operationsand their modern variants emerge as optimal solutions to the IB objective, and our experiments reveal thatthe quality, not just the quantity, of an operation’s complexity governs its performance.

Get the Android app

Or read this on Hacker News