DNE: A Deep Machine Learning-based Network Encoder that Uncovers Non-linear Gene-gene Functional Associations at Systems Level

SUMMARY RESULTS DOWNLOAD SUPPORT CITATION

Summary ^

Deep Network Encoder (DNE), a novel network reverse engineering method developed in Hu Li's lab that harness the power of autoencoder and demonstrated that it is possible to uncover non-linear gene-gene associations learned from high-dimensional omics data. Genes do not act alone but are interdependent to each other to execute their functions. These mutual interactions between genes are the basis for the non-linear nature of the gene-gene functional associations in a biological network. We realized that the network architectures of deep learning models have similar analogies with real-world networks (e.g. biological networks) where information encoded in weights connecting each nodes in learned models can enable us to decipher non-linear relations between features (e.g. gene-gene associations) during learning processes. That is, deep neural network models can be used as knowledge discovery platforms to uncover non-linear relations between features in real world. With this key insight, we developed Deep Network Encoder (DNE) to decipher non-linear relations of features from high-dimensional data. We devised association scores where non-linear association for a pair of genes can be inferred using weights of all possible paths that connect these genes from input to output nodes. A biological network that contained highly associated gene pairs will be reverse engineered.

DNE workflow

Overview of Deep Network Encoder (DNE) algorithm using breast cancer prognosis as an illustrative example. (a) Model building and training. Gene expression profiles of breast cancer patients with known prognostic outcomes were assigned as good and poor prognosis according to disease relapse-free survival (DRFS) that inform survival length of patients without cancer relapse. Models corresponding to good and poor prognosis respectively were bulit. Autoencoder algorithm was used to train the models, with input vector represents all genes present in transcriptomics data and the expression value of each gene corresponds to respective node at input layer. The dimensionality of output layer (i.e. number of nodes) is the same with input layer. The aim of autoencoder training is to reconstruct values from input layer at the output layer. The resulting output vector generated from output layer will be compared with input vector to compute error of reconstruction. The training process was repeated by updating weights connecting nodes (or neurons) from input layer to hidden layer and from hidden layer to output layer via backpropagation algorithm. Training process will come to halt when no further improvement on reconstruction error was achieved. (b) Network encoding via trained model. Weights connecting all nodes (neurons) from input to output layers in a trained autoencoder model were used to encode non-linear gene-gene associations using an association scoring scheme. Computed association scores for all gene pairs were given in an association score matrix where gene pairs with top 200 absolute scores were selected. Genes that occur multiple times in these top 200 gene pairs will served as "seeds" to agglomerate gene pairs into a network.

Results ^

Explore gene association networks derived from autoencoder models for phenotype groups here.

Download ^

Download the scripts with sample dataset to run DNE on your local Linux system. Extract the scripts and sample datasets into the same folder and read the README.txt file to get started.

Support ^

For support of DNE, please post to our web forum.

Citation ^

Manuscript in preparation.


© 2016 H Li • All Rights Reserved