GSO-2010: Alaa Abi-Haidar and Luis M. Rocha
Collective Classification of Textual Documents Using Self-Organized Cross-Regulatory T-cells in the Adaptive Immune System
The immune system is a distributed collection of molecular constituents with no central controller. Therefore, its classification of self from nonself needs to result from a self-organizing, collective classification process, defined as the ability of decentralized systems of many components to classify situations that require global information or coordinated action. Nature is full of examples of collective classification: the dynamics of stomata cells on leaf surfaces are known to be statistically indistinguishable from the dynamics of automata that are capable of performing nontrivial classification, biochemical intracellular signal transduction networks are capable of emergent classification, quorum sensing in bacteria and social insects, etc. We can study collective classification in general models of complex system such as Cellular Automata, namely by identifying regular patterns in the dynamics that store, transmit and process information. Here, instead of looking at generalmodels of complex systems, we focus on a specific immunological model of T-Cell cross-regulation self-organizing dynamics.
By guiding the self-organizing dynamics of this model from initial conditions and dynamical parameters, we have : (1) built a novel bio-inspired machine learning solution for document classification, and (2) provided a better understanding of how well collections of T-Cells engaged in crossregulation perform as a classifier. The first goal entails a bioinspired approach to computational intelligence, and the second a computational biology experiment, but both are based on complex systems principles.
Our bio-inspired solution for binary classification of textual documents is inspired by T-cell cross-regulation in the vertebrate adaptive immune system, which is a complex adaptive system of millions of self-organized cells, interacting to distinguish between self and nonself substances. In analogy, automatic document classification assumes that the "interaction" and co-occurrence of thousands of words in text can be used to identify conceptually-related classes of documents-at a minimum, two classes with relevant and irrelevant documents for a given concept or preference (e.g. articles with protein-protein interaction information, and desirable e-mail, known as ham, in contrast to fraudulent and illegitimate e-mail, known as spam). Our agent-based method for document classification expands the existing analytical model of T-Cell crossregulation , by allowing us to deal simultaneously with many distinct populations of antigen-specific T-Cells and their collective and decentralized self-organized dynamics.
Our results are useful for machine learning in general and biomedical text mining in particular, but they also help us understand T-cell cross-regulation as a potential general principle of classification available to collectives of molecules without a central controller---a paradigmatic case of guided self-organization. While there is still much to know about the specifics of T-cell cross-regulation in adaptive immunity, Artificial Life models such as ours, allow us to explore alternative emergent classification principles while producing useful bio-inspired tools.