Foundation Models

We believe cancer's complexity fundamentally exceeds what humans can parse. So, the data we generate is not designed for human interpretation, but rather training data for foundation models.

To create these models, we use a masked reconstruction objective: the model sees a tumor's hidden transcriptomic, morphologic, or proteomic landscape, and learns to predict what's missing given surrounding spatial context.

Despite this being an entirely self supervised task with no explicit labels, models trained on this task can generate embedding spaces of tumor samples that mirror meaningful biology.

How can a model, without labels, learn patterns that are so clinically relevant?

The answer lies in how cancer drug development works. Therapies aren't approved for broad, undifferentiated populations, but are instead meant for genetically or phenotypically defined subgroups. The axes that determine patient response are the same axes that structure our tissue level data.

In other words, the model doesn't need labels, because the therapeutically relevant dimensions are already encoded in the tissue itself.

The machine learning team at Noetik is actively working on extending this research, and we work on everything from new model architectures, investigating scaling behavior, mechanistic interpretability, and a lot more! If this seems like something you’d like to work on, reach out to info@noetik.ai!

For more information on how these models are used in practice, read here.

Foundation Models from Human Data

How can a model, without labels, learn patterns that are so clinically relevant?