First Glance at KEGGgraph

Posted on Nov 21, 2022

Tags: KEGG, Graph, Metabolic Network, Microarray

This post is based on

Pathways are stored and presented as graphs on the KEGG server side, where

nodes are molecules (protein, compound, etc)
edges represent relation types between the nodes, e.g., activation or phosphorylation (氧化磷酸化作用)

KEGGgraph offers the following functionalities:

Parsing: one node in KEGG pathway does not necessarily map to merely one gene product, for example, the node “ERK” in the human TGF-Beta signaling pathway contains two homologues, MAPK1 and MAPK3. User can set whether to expand these nodes topologically.
Graph operations: subset and merge
Visualization

the pathway identifier is in the form of [a-z]3[0-9]5, where the three-alphabet code represent the organism and the five digits represent pathway.

Since Auguest 2009, KGML files are stored separately in different subdirectories depending on whether they record metabolic or nonmetabolic pathways. retrieveKGML first tries to download the file from the non-metabolic subdirectory, and tries the metabolic directory in case no file was found in the non-metabolic step.

Parsing and graph feature query

first read in KGML file for human MAPK signaling pathway (with KEGG ID hsa04010)”

mapkKGML = system.file("extdata/hsa04010.xml", package = "KEGGgraph")

we can either parse them into an object of KEGGPathway or an object of graph.

KEGGPathway maintains the information of the pathway (title, link, organism, etc)
graph objects are more natural approach and can be directly plugged in many other tools

library(KEGGgraph)

## 
## Attaching package: 'KEGGgraph'

## The following object is masked from 'package:graphics':
## 
##     plot

## The following object is masked from 'package:base':
## 
##     plot

mapkG = parseKGML2Graph(mapkKGML, expandGenes =  TRUE)
mapkG

## A graphNEL graph with directed edges
## Number of Nodes = 265 
## Number of Edges = 876

alternatively,

mapkpathway = parseKGML(mapkKGML)
mapkpathway

## KEGG Pathway
## [ Title ]: MAPK signaling pathway
## [ Name ]: path:hsa04010
## [ Organism ]: hsa
## [ Number ] :04010
## [ Image ] :http://www.genome.jp/kegg/pathway/hsa/hsa04010.gif
## [ Link ] :http://www.genome.jp/dbget-bin/show_pathway?hsa04010
## ------------------------------------------------------------
## Statistics:
##  136 node(s)
##  171 edge(s)
##  0 reaction(s)
## ------------------------------------------------------------

mapkG2 = KEGGpathway2Graph(mapkpathway, expandGenes = TRUE)

expandGenes: ?? not exactly clear about
genesOnly

Edges in KEGG pathways are directional, not guarantee a reverse relation, although reciprocal edges are also allowed.

extract the node attributes

mapkGnodedata = getKEGGnodeData(mapkG)
mapkGnodedata[[2]]

## KEGG Node (Entry 'hsa:5924'):
## ------------------------------------------------------------
## [ displayName ]: RASGRF1, GRF1...
## [ Name ]: hsa:5924
## [ Type ]: gene
## [ Link ]: http://www.genome.jp/dbget-bin/www_bget?hsa+5923+5924
## ------------------------------------------------------------

alternatively,

getKEGGnodeData(mapkG, 'hsa:5924')

## KEGG Node (Entry 'hsa:5924'):
## ------------------------------------------------------------
## [ displayName ]: RASGRF1, GRF1...
## [ Name ]: hsa:5924
## [ Type ]: gene
## [ Link ]: http://www.genome.jp/dbget-bin/www_bget?hsa+5923+5924
## ------------------------------------------------------------

Similarly,

mapkGedgedata = getKEGGedgeData(mapkG)
mapkGedgedata[[4]]

##   KEGG Edge (Type: PPrel):
## ------------------------------------------------------------
## [ Entry 1 ID ]: hsa:5923
## [ Entry 2 ID ]: hsa:3845
## [ Subtype ]: 
##   [ Subtype name ]: activation
##   [ Subtype value ]: -->
## ------------------------------------------------------------

alternatively,

getKEGGedgeData(mapkG,'hsa:5923~hsa:3845')

##   KEGG Edge (Type: PPrel):
## ------------------------------------------------------------
## [ Entry 1 ID ]: hsa:5923
## [ Entry 2 ID ]: hsa:3845
## [ Subtype ]: 
##   [ Subtype name ]: activation
##   [ Subtype value ]: -->
## ------------------------------------------------------------

Roughly speaking the out-degree reflects the regulatory role, while the in-degree suggests the subjectability of the protein to intermolecular regulations.

mapkGoutdegrees = sapply(edges(mapkG), length)
mapkGindegrees = sapply(inEdges(mapkG), length)
topouts = sort(mapkGoutdegrees, decreasing  = T)
topins = sort(mapkGindegrees, decreasing = T)
topouts[1:3]

## hsa:5594 hsa:5595 hsa:1432 
##       26       26       13

topins[1:3]

##  hsa:5923  hsa:5924 hsa:10125 
##        26        26        26

Graph subset and merge

library(Rgraphviz)

## Loading required package: graph

## Loading required package: BiocGenerics

## 
## Attaching package: 'BiocGenerics'

## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs

## The following objects are masked from 'package:base':
## 
##     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
##     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
##     get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
##     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
##     Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
##     table, tapply, union, unique, unsplit, which.max, which.min

## Loading required package: grid

set.seed(123)
randomNodes = sample(nodes(mapkG), 25)
mapkGsub = subGraph(randomNodes, mapkG)
mapkGsub

## A graphNEL graph with directed edges
## Number of Nodes = 25 
## Number of Edges = 6

makeAttr <- function(graph, default, valNodeList) {
  tmp <- nodes(graph)
  x <- rep(default, length(tmp)); names(x) <- tmp
  
  if(!missing(valNodeList)) {
    stopifnot(is.list(valNodeList))
    allnodes <- unlist(valNodeList)
    stopifnot(all(allnodes %in% tmp))
    for(i in seq(valNodeList)) {
      x[valNodeList[[i]]] <- names(valNodeList)[i]
    }
  }
  return(x)
}

outs = sapply(edges(mapkGsub), length) > 0
ins = sapply(inEdges(mapkGsub), length) > 0
ios = outs | ins

translate the KEGG IDs into Gene Symbol

library(org.Hs.eg.db)

## Loading required package: AnnotationDbi

## Loading required package: stats4

## Loading required package: Biobase

## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.

## Loading required package: IRanges

## Loading required package: S4Vectors

## 
## Attaching package: 'S4Vectors'

## The following objects are masked from 'package:Rgraphviz':
## 
##     from, to

## The following objects are masked from 'package:base':
## 
##     expand.grid, I, unname

## 

ioGeneID = translateKEGGID2GeneID(names(ios))
nodesNames = sapply(mget(ioGeneID, org.Hs.egSYMBOL, ifnotfound=NA), "[[", 1)
names(nodesNames) = names(ios)

plot it

# https://www.bioconductor.org/packages/release/bioc/vignettes/KEGGgraph/inst/doc/KEGGgraph.R
nAttrs <- list();
nAttrs$fillcolor <- makeAttr(mapkGsub, "lightgrey", list(orange=names(ios)[ios]))
nAttrs$label <- nodesNames
plot(mapkGsub, "neato", nodeAttrs=nAttrs,
     attrs=list(node=list(fillcolor="lightgreen",
                  width="0.75", shape="ellipse"), 
       edge=list(arrowsize="0.7")))

unnamed-chunk-14-1

merge two graphs

wntKGML = system.file("extdata/hsa04310.xml", package = "KEGGgraph")
wntG = parseKGML2Graph(wntKGML)
graphs = list(mapk = mapkG, wnt = wntG)
merged = mergeGraphs(graphs)
merged

## A graphNEL graph with directed edges
## Number of Nodes = 386 
## Number of Edges = 1628

Using other graph tools

question: which nodes are of the highest importance in MAPK signalling pathway solution: relative betweenness centrality

betweenness if a centrality measure of a node within a graph. Nodes that occur on many shortest paths between other vertices have higher betweenness than those that do not. It is scaled by the factor of $(n-1)(n-2)/2$ to get relative betweenness centrality, where $n$ is the number of nodes in the graph. Both measurements estimate the importance or the role of the node in the graph.

library(RBGL)
bcc = brandes.betweenness.centrality(mapkG)
rbccs = bcc$relative.betweenness.centrality.vertices[1L,]
toprbccs = sort(rbccs, decreasing = TRUE)[1:4]
toprbccs

##  hsa:4214  hsa:2885  hsa:5605  hsa:5604 
## 0.2685233 0.2467144 0.2366450 0.2366450

Parsing chemical compound reaction network

KEGG pathway captures two kinds of network:

the protein network: consists relations (edges) between gene products
the chemical network: the reactions between chemical compounds

The metabolic pathway can be viewed both as a network of proteins (enzymes) and as a network of chemical compounds, whereas regulatory pathways are always viewed as protein networks only.

check the example of Glycine, serine and threonine metabolism pathway

mapfile = system.file("extdata/map00260.xml", package = "KEGGgraph")
map = parseKGML(mapfile)
map

## KEGG Pathway
## [ Title ]: Glycine, serine and threonine metabolism
## [ Name ]: path:map00260
## [ Organism ]: map
## [ Number ] :00260
## [ Image ] :http://www.genome.jp/kegg/pathway/map/map00260.gif
## [ Link ] :http://www.genome.jp/dbget-bin/show_pathway?map00260
## ------------------------------------------------------------
## Statistics:
##  144 node(s)
##  371 edge(s)
##  68 reaction(s)
## ------------------------------------------------------------

reactions = getReactions(map)
reactions[[1]]

## KEGG Reaction(rn:R08211)
## ------------------------------------------------------------
## [ Name ]: rn:R08211
## [ Type ]: irreversible
## [ Substrate Name ]: cpd:C00576
## [ Product Name ]: cpd:C00719

Expand embedded pathways

mapkGembed = parseKGMLexpandMaps(mapkKGML)

only consider gene after using genesOnly = FALSE,

mapkGall = parseKGML2Graph(mapkKGML, genesOnly = FALSE)
mapkGall

## A graphNEL graph with directed edges
## Number of Nodes = 277 
## Number of Edges = 891

mapkGsub = subGraphByNodeType(mapkGall, "gene")
mapkGsub

## A graphNEL graph with directed edges
## Number of Nodes = 265 
## Number of Edges = 876

Annotation

translate KEGG identifier (KEGGID) into Entrez GeneID,

toprbccKEGGID = names(toprbccs)
toprbccKEGGID

## [1] "hsa:4214" "hsa:2885" "hsa:5605" "hsa:5604"

toprbccGeneID = translateKEGGID2GeneID(toprbccKEGGID)
toprbccGeneID

## [1] "4214" "2885" "5605" "5604"

To convert GeneID to other identifiers, the authors recommend genome wide annotation packages,

for human it is org.Hs.eg.db
for other organisms, http://www.bioconductor.org/packages/release/data/annotation/

Parse created or edited pathway

KGML-ED program is a tool designed for the dynamic visualization, interactive navigation and editing of KEGG pathway diagrams. The program is able to modify or even create de novo new pathways and export them into KGML (KEGG XML) files
KEGGgraph captures the graph information of KEGG PATHWAY by parsing KGML files, it is also capable to parse the KGML files exported from KGML-ED program.

Joint use: allow user edit KGML files from KEGG (e.g., by adding/removing new nodes/edges), or create new pathways from scratch, and then analyse them.

Demonstrate the parse of a toy pathway (network motif) created by KGML-ED, which illustrates a feed-forward loop (FFL)

toyKGML = system.file("extdata/kgml-ed-toy.xml", package = "KEGGgraph")

$ cat /media/weiya/PSSD/Programs/R/4.2.1/lib/R/library/KEGGgraph/extdata/kgml-ed-toy.xml
<?xml version="1.0" encoding="UTF-8"?>
<pathway name="path:KGML_ED_TOY_00001" org="hsa" number="00001" title="KGML-ED toy pathway" image="" link="">
  <entry id="1" name="CRP" type="gene">
    <graphics name="CRP" fgcolor="#000000" bgcolor="#FFFFFF" type="circle" x="210" y="180" width="120" height="60" />
  </entry>
  <entry id="2" name="GalS" type="gene">
    <graphics name="GalS" fgcolor="#000000" bgcolor="#FFFFFF" type="circle" x="450" y="180" width="120" height="60" />
  </entry>
  <entry id="3" name="galETK" type="gene">
    <graphics name="galETK" fgcolor="#000000" bgcolor="#FFFFFF" type="circle" x="450" y="360" width="120" height="60" />
  </entry>
  <entry id="4" name="cAMP" type="compound">
    <graphics name="cAMP" fgcolor="#000000" bgcolor="#FFFFFF" type="circle" x="100" y="90" width="60" height="60" />
  </entry>
  <entry id="5" name="galactose" type="compound">
    <graphics name="galactose" fgcolor="#000000" bgcolor="#FFFFFF" type="circle" x="590" y="90" width="60" height="60" />
  </entry>
  <relation entry1="1" entry2="2" type="PPrel">
    <subtype name="activation" value="-->"/>
  </relation>
  <relation entry1="2" entry2="3" type="GErel">
    <subtype name="repression" value="--|"/>
  </relation>
  <relation entry1="1" entry2="3" type="GErel">
    <subtype name="expression" value="--|"/>
  </relation>
  <relation entry1="4" entry2="1" type="PCrel">
    <subtype name="activation" value="-->"/>
  </relation>
  <relation entry1="5" entry2="2" type="PCrel">
    <subtype name="activation" value="-->"/>
  </relation>
  <relation entry1="2" entry2="2" type="PPrel">
    <subtype name="inhibition" value="--|"/>
  </relation>
</pathway>

toyGraph = parseKGML2Graph(toyKGML, genesOnly = FALSE)
toyGraph

## A graphNEL graph with directed edges
## Number of Nodes = 5 
## Number of Edges = 6

nodes(toyGraph)

## [1] "CRP"       "GalS"      "galETK"    "cAMP"      "galactose"

library(Rgraphviz)
plot(toyGraph)

toyGraph-plain-1-1

alternatively, use

plotKEGGgraph(toyGraph)

render the attributes

nodeInfo = getKEGGnodeData(toyGraph)
nodeType = sapply(nodeInfo, getType)
makeNodeRenderAttrs = function(g, label = nodes(g), shape = "ellipse", fill="#e0e0e0", ...) {
    rv = list(label = label, shape = shape, fill = fill, ...)
    nA = nodeRenderInfo(g)
    for (i in seq(along = rv)) {
        if (length(rv[[i]]) == 1) {
            rv[[i]] = rep(rv[[i]], numNodes(g))
        } else {
            if (length(rv[[i]]) != numNodes(g))
                stop("Attributes vector must have as many elements as 'g' has nodes.")
        }
        names(rv[[i]]) = nodes(g)
        nA[[ names(rv)[[i]] ]] = rv[[i]]
    }
    nodeRenderInfo(g) = nA
    return(g)
}
toyDrawn = plotKEGGgraph(toyGraph)

toyDrawnRefine = makeNodeRenderAttrs(toyDrawn, fill = ifelse(nodeType == "gene", "lightblue", "orange"),
                                     shape = ifelse(nodeType  == "gene", "ellipse", "rectangle"))
renderGraph(toyDrawnRefine)

toyGraph-color-2

Microarray analysis

KEGG PATHWAY contains pathway maps for nearly 1000 organisms and has been intensively expanded with pathways including signaling transduction, cellular processes and human diseases in recent years.

The microarray data is Affemetric GeneChip (GEO accession: GSE4107), containing 10 normal samples and 12 colorectal cancer samples, is normalized with affy and limma package. The result of topTable in limma is used as the input data.

load(system.file("extdata/colorectalcancerSPIA.RData", package="KEGGgraph"))
head(top)

##                ID    logFC  AveExpr        t      P.Value    adj.P.Val        B
## 10738   201289_at 5.960206 6.226928 23.94388 1.789221e-17 9.782565e-13 25.40124
## 18604   209189_at 5.143502 7.487305 17.42995 1.560212e-14 2.843486e-10 21.02120
## 11143 201694_s_at 4.148081 7.038281 16.46040 5.153117e-14 7.043667e-10 20.14963
## 10490 201041_s_at 2.429889 9.594413 14.06891 1.293706e-12 1.414667e-08 17.66883
## 10913 201464_x_at 1.531126 8.221044 10.98634 1.685519e-10 1.151947e-06 13.61189
## 11463   202014_at 1.429269 5.327647 10.45906 4.274251e-10 2.418771e-06 12.80131

FC indicates Fold Change, a measure describing how much a quantity changes between an original and a subsequent measurement.

Entrez is a molecular biology database system that provides integrated access to nucleotide and protein sequence data, gene-centered and genomic mapping information, 3D structure data, PubMed MEDLINE, and more.

整理好了表达矩阵以后，我们需要将探针的id转换成为基因的Gene symbol。对于探针id的转换过程，目前主要是通过R包来进行转换。随着芯片平台的普遍使用，其基因的注释信息也被整理成了不同的R包；因此，通常情况下我们使用R包来注释。不同的平台，对应着不同的R包。首先，我们来看一下这个数据集使用的平台类型。

该芯片使用的是我们最常见的平台，GPL570 对于GPL570，其对应的R包是hgu133plus2.db包

source: https://zhuanlan.zhihu.com/p/340601138

基因探针，即核酸探针，是一段带有检测标记，且顺序已知的，与目的基因互补的核酸序列(DNA或RNA)。基因探针通过分子杂交与目的基因结合，产生杂交信号，能从浩瀚的基因组中把目的基因显示出来。

source: https://baike.baidu.com/item/%E5%9F%BA%E5%9B%A0%E6%8E%A2%E9%92%88

library(hgu133plus2.db)

##

ls("package:hgu133plus2.db")

##  [1] "hgu133plus2"              "hgu133plus2_dbconn"      
##  [3] "hgu133plus2_dbfile"       "hgu133plus2_dbInfo"      
##  [5] "hgu133plus2_dbschema"     "hgu133plus2.db"          
##  [7] "hgu133plus2ACCNUM"        "hgu133plus2ALIAS2PROBE"  
##  [9] "hgu133plus2CHR"           "hgu133plus2CHRLENGTHS"   
## [11] "hgu133plus2CHRLOC"        "hgu133plus2CHRLOCEND"    
## [13] "hgu133plus2ENSEMBL"       "hgu133plus2ENSEMBL2PROBE"
## [15] "hgu133plus2ENTREZID"      "hgu133plus2ENZYME"       
## [17] "hgu133plus2ENZYME2PROBE"  "hgu133plus2GENENAME"     
## [19] "hgu133plus2GO"            "hgu133plus2GO2ALLPROBES" 
## [21] "hgu133plus2GO2PROBE"      "hgu133plus2MAP"          
## [23] "hgu133plus2MAPCOUNTS"     "hgu133plus2OMIM"         
## [25] "hgu133plus2ORGANISM"      "hgu133plus2ORGPKG"       
## [27] "hgu133plus2PATH"          "hgu133plus2PATH2PROBE"   
## [29] "hgu133plus2PFAM"          "hgu133plus2PMID"         
## [31] "hgu133plus2PMID2PROBE"    "hgu133plus2PROSITE"      
## [33] "hgu133plus2REFSEQ"        "hgu133plus2SYMBOL"       
## [35] "hgu133plus2UNIPROT"

then we can obtain the correspondence between probe id and gene symbol

ids = toTable(hgu133plus2SYMBOL)
head(ids)

##    probe_id symbol
## 1 1007_s_at   DDR1
## 2   1053_at   RFC2
## 3    117_at  HSPA6
## 4    121_at   PAX8
## 5 1255_g_at GUCA1A
## 6   1294_at   UBA7

Only consider the probes annotated with EntrezGeneID and differentially expressed with FDR p-value less than 0.05.

x = hgu133plus2ENTREZID
x

## ENTREZID map for chip hgu133plus2 (object of class "ProbeAnnDbBimap")

x[top$ID]

## ENTREZID submap for chip hgu133plus2 (object of class "ProbeAnnDbBimap")

top$ENTREZ = unlist(as.list(x[top$ID]))

remove NA

top = top[!is.na(top$ENTREZ),]
top = top[!duplicated(top$ENTREZ),]
tg1 = top[top$adj.P.Val < 0.05, ]
DE_Colorectal = tg1$logFC
names(DE_Colorectal) = as.vector(tg1$ENTREZ)
All_Colorectal = top$ENTREZ

length(DE_Colorectal)

## [1] 3630

length(All_Colorectal)

## [1] 17716

The SPIA algorithm takes two vectors as input:

DE_Colorectal: log2 fold changes of differentially expressed genes
All_Colorectal: those of all EntrezGeneID annotated genes

and it will return a table of pathways ranked from the most to the least significant. The results show that Colorectal cancer pathway (ID:05210) is significantly activated (ranked the 5th of the result table).

Retrieve the KGML remotely from the KEGG FTP site

tmp = "hsa05210.xml"
retrieveKGML(pathwayid = "05210", organism = "hsa", destfile = tmp)

The file has been attached in the KEGGgraph package. Parse it into graph and observe that around 30% percent of the genes in the pathway are differentially expressed.

colFile = system.file("extdata/hsa05210.xml", package = "KEGGgraph")
g = parseKGML2Graph(colFile)
deKID = translateGeneID2KEGGID(names(DE_Colorectal))
allKID = translateGeneID2KEGGID(All_Colorectal)
isDiffExp = nodes(g) %in% deKID
sprintf("%2.2f%% genes differentially-expressed", mean(isDiffExp)*100)

## [1] "29.76% genes differentially-expressed"

visualize the pathway

plot(g)

pathway-plain-1

# https://www.bioconductor.org/packages/release/bioc/vignettes/KEGGgraph/inst/doc/KEGGgraphApp.R
library(RColorBrewer)
ar = 20
cols = rev(colorRampPalette(brewer.pal(6, "RdBu"))(ar))
logfcs = DE_Colorectal[match(nodes(g), deKID)]
names(logfcs) = nodes(g)
logfcs[is.na(logfcs)] = 0
incol = round((logfcs + 2)*5)
incol[incol > ar] = ar
undetected = !nodes(g) %in% allKID
logcol = cols[incol]
logcol[logfcs == 0] = "darkgrey"
logcol[undetected] = "yellow"
names(logcol) = names(logfcs)
nA = makeNodeAttrs(g, fillcolor = logcol, label = "", width = 10, height = 1.2)
layout(mat = matrix(c(rep(1, 8), 2), ncol = 1, byrow = TRUE)) ## very important!!! without it, the circles reduce to a line, it is (almost) equivalent to `c(rep(1, 1), 2)`.
plot(g, "dot", nodeAttrs = nA)
image(as.matrix(seq(1, ar)), col = cols, yaxt = "n", xaxt = "n")

pathway-color-1

Not all the genes are connected with each other, 34/84 nodes in the pathway have a degree of 0

d = degree(g)
sum(d$inDegree == 0 & d$outDegree == 0)

## [1] 34

# https://www.bioconductor.org/packages/release/bioc/vignettes/KEGGgraph/inst/doc/KEGGgraphApp.R
gDeg = degree(g)
gIsSingle = gDeg[[1]] + gDeg[[2]] == 0
options(digits=3)
gGeneID = translateKEGGID2GeneID(nodes(g))
gSymbol = sapply(gGeneID, function(x) mget(x, org.Hs.egSYMBOL, ifnotfound = NA)[[1]])
isUp = logfcs > 0
isDown = logfcs < 0
singleUp = isUp & gIsSingle
singleDown = isDown & gIsSingle

they are not connected to any other node. There are several reasons for this.

genes with genetic alterations are indicated in the pathway map, but not interweaved into the pathway
in some pathways, especially human disease pathways, KEGG records genes involved in the disease but does not provide any edge in that pathway, although one can find their interaction partners in other pathways, e.g., Grb2 has no edges in colorectal cancer pathway, however in other maps including MAPK signaling pathway, both up- and down-stream interactors are found.
- one solution is to merge (union) the related pathways together: mergeKEGGgraphs

From the visualization, it is difficult to recognize any patterns. To examine the pathway, use Divide and Conquer strategy, namely to subset the graph into subgraphs: subKEGGgraph

ups = nodes(g)[logfcs > 0]
upNs = unique(unlist(neighborhood(g, ups, return.self = TRUE)))
upSub = subKEGGgraph(upNs, g)
upNeighbor = nodes(upSub)[sapply(neighborhood(upSub, nodes(upSub)), length) > 0]
upNeighbor = setdiff(upNeighbor, nodes(g)[undetected])
upSub = subKEGGgraph(upNeighbor, upSub)
upSubGID = translateKEGGID2GeneID(nodes(upSub))
upSymbol = gSymbol[upSubGID]
upnA = makeNodeAttrs(upSub, fillcolor = logcol[nodes(upSub)], label = upSymbol, fixedsize = TRUE, width = 10, height = 10, font = 20)
plot(upSub, "dot", nodeAttrs = upnA)

subgraph-color-1

Use degree centrality (the sum of in- and out-degree of each node) as a measure of the relative importance of the node compared to others.

8 nodes have an degree equal or larger than three, and 5 of them are upregulated to different extent.

sessionInfo()

## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/libf77blas.so.3.10.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
## 
## locale:
##  [1] LC_CTYPE=zh_CN.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.utf8         LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    grid      stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] RColorBrewer_1.1-3    hgu133plus2.db_3.13.0 RBGL_1.74.0          
##  [4] org.Hs.eg.db_3.16.0   AnnotationDbi_1.60.0  IRanges_2.32.0       
##  [7] S4Vectors_0.36.0      Biobase_2.58.0        Rgraphviz_2.42.0     
## [10] graph_1.76.0          BiocGenerics_0.44.0   KEGGgraph_1.58.0     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.9             highr_0.9              GenomeInfoDb_1.34.3   
##  [4] XVector_0.38.0         compiler_4.2.1         zlibbioc_1.44.0       
##  [7] bitops_1.0-7           tools_4.2.1            digest_0.6.30         
## [10] bit_4.0.5              evaluate_0.18          RSQLite_2.2.18        
## [13] memoise_2.0.1          pkgconfig_2.0.3        png_0.1-7             
## [16] rlang_1.0.6            cli_3.4.1              DBI_1.1.3             
## [19] rstudioapi_0.14        yaml_2.3.6             xfun_0.35             
## [22] fastmap_1.1.0          GenomeInfoDbData_1.2.9 stringr_1.4.1         
## [25] httr_1.4.4             knitr_1.41             Biostrings_2.66.0     
## [28] vctrs_0.5.1            bit64_4.0.5            R6_2.5.1              
## [31] XML_3.99-0.12          rmarkdown_2.18         blob_1.2.3            
## [34] magrittr_2.0.3         htmltools_0.5.3        KEGGREST_1.38.0       
## [37] stringi_1.7.8          RCurl_1.98-1.9         cachem_1.0.6          
## [40] crayon_1.5.2

Published in categories Note

← previous next →

See all posts →

WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

First Glance at KEGGgraph

Posted on Nov 21, 2022

Parsing and graph feature query

Graph subset and merge

Using other graph tools

Parsing chemical compound reaction network

Expand embedded pathways

Annotation

Parse created or edited pathway

Microarray analysis