A human glomerular SAGE transcriptome database

Background To facilitate in the identification of gene products important in regulating renal glomerular structure and function, we have produced an annotated transcriptome database for normal human glomeruli using the SAGE approach. Description The database contains 22,907 unique SAGE tag sequences, with a total tag count of 48,905. For each SAGE tag, the ratio of its frequency in glomeruli relative to that in 115 non-glomerular tissues or cells, a measure of transcript enrichment in glomeruli, was calculated. A total of 133 SAGE tags representing well-characterized transcripts were enriched 10-fold or more in glomeruli compared to other tissues. Comparison of data from this study with a previous human glomerular Sau3A-anchored SAGE library reveals that 47 of the highly enriched transcripts are common to both libraries. Among these are the SAGE tags representing many podocyte-predominant transcripts like WT-1, podocin and synaptopodin. Enrichment of podocyte transcript tags SAGE library indicates that other SAGE tags observed at much higher frequencies in this glomerular compared to non-glomerular SAGE libraries are likely to be glomerulus-predominant. A higher level of mRNA expression for 19 transcripts represented by glomerulus-enriched SAGE tags was verified by RT-PCR comparing glomeruli to lung, liver and spleen. Conclusion The database can be retrieved from, or interrogated online at http://cgap.nci.nih.gov/SAGE. The annotated database is also provided as an additional file with gene identification for 9,022, and matches to the human genome or transcript homologs in other species for 1,433 tags. It should be a useful tool for in silico mining of glomerular gene expression.


Background
Renal glomeruli are highly specialized capillary tufts that produce a nearly protein-free ultrafiltrate of plasma at a rate of about 30 plasma volumes daily. Several hereditary, immune-mediated and metabolic disorders cause glomerular injury, proteinuria, and can lead to renal failure. The three intrinsic glomerular cell types, podocytes, mesangial cells, and glomerular endothelial cells (EC) are highly spe-cialized. Podocytes extend an elaborate array of actin-rich foot processes around the exterior of the glomerular capillary loops, forming a scaffold with nephrin-based filtration slit diaphragms spanning the space between adjacent foot processes [1]. Mesangial cells are pericyte-like cells that, unlike most other pericytes, form an interstitium within the intracapillary space [2]. Glomerular EC are packed with transcellular fenestrae ringed by actin [3,4].
The fenestrae serve the high glomerular capillary wall hydraulic conductivity [5], while a glycocalyx covering the glomerular EC and podocytes together with the podocyte filtration slit diaphragm impede the movement of plasma proteins [6][7][8] across the glomerular capillary wall.
Transcriptome and proteomic approaches are helping to define genes highly expressed and/or enriched in glomeruli [9][10][11][12]. For instance, Sau3A-anchored SAGE databases have been built with RNA extracted from microdissected nephron segments, and enrichment of several glomerular transcripts relative to other nephron segments has been reported [9]. Furthermore, many proteins uniquely expressed by, or enriched in podocytes have been identified over the past decade and their specific functions are increasingly well defined [13]. Finally, by analysis of ESTs enriched in glomeruli, ehd3 was shown to be the first transcript expressed exclusively by glomerular EC [12].
The current study sought to extend previous transcriptome-based work by building a human glomerular Long-SAGE database that can be interrogated directly online. SAGE is based on the principal that a small (here 17 bp) tag sequence immediately 3' of an "anchoring" restriction site is a unique identifier of each transcript [14]. The frequency of specific SAGE tags relative to the total pool of tags reflects their abundance in the source mRNA. In silico comparison of SAGE libraries from diverse tissues can then be used to discover differential expression of transcripts [15]. The level of expression of any specific transcript can also be probed in silico by interrogating SAGE libraries with the transcript's unique SAGE tag sequence.
We report here the gene expression profile of transcripts for human glomeruli and compare them to pooled SAGE libraries for non-glomerular tissues and cells. Many of the most highly enriched glomerular transcripts reported here were previously found in a Sau3A-anchored glomerular library [9]. Nonetheless, the current SAGE database contains additional glomerulus-enriched transcript tags, and since it is NlaIII-anchored it now allows direct comparison with many non-renal SAGE libraries. The data should serve as a useful resource for investigators studying glomerular gene expression.

Cells and Tissues
Human kidney tissue was obtained from the uninvolved portion of tumor nephrectomy specimens (Human Subjects Protocols: #6196 University of Alberta and #155/97 University of Frankfurt). Patient 1 was a 45 year old Caucasian female, patient 2 a 72 yo Caucasian male. Renal tissue was collected only from patients in whom the serum creatinine was within normal limits, and in whom diabetes mellitus, hypertension, and proteinuria were absent. Specific parameters were not collected for individual patients. The relative purity of isolated glomeruli and the normal histological appearance of kidney cortex used in this study are shown in Figure 1. That the cDNA template used for SAGE contained mRNA representing glomerular capillary endothelium is shown by RT-PCR amplification of PECAM-1 (836 bp) and the non-integrin laminin receptor LAMR1 (256 bp). Greater synaptopodin transcript abundance in glomerular, compared to whole kidney cortex mRNA from the same specimen also shows appropriate enrichment of the source mRNA in glomerular podocyte transcripts ( Figure 1). Amplification of the long PECAM-1 sequence furthermore shows that the source mRNA was intact. Sufficient mRNA for construction of the SAGE library was only obtained from patient 2. The integrity of this source material was also verified by Agilent 2100 microfluidics analysis (data not shown).

Human Glomerular SAGE Library Construction
For SAGE, human glomeruli were isolated by sieving in icecold phosphate buffered saline (PBS) from the kidney of a 72-year old male using minor modifications of the protocol for rat glomeruli [16,17]. Glomeruli were immediately placed into RNA-Protect (Qiagen, Valencia, CA) followed by isolation of 4.7 μg total RNA with the RNeasy kit (Qiagen). A SAGE library was then custom-constructed by Genzyme Corporation (Framingham, MA) using the "long" SAGE protocol, producing 17 bp SAGE tags with the CATG (NlaIII) anchoring restriction site [18]. A total of 2304 clones containing concatenated ditags were sequenced, resulting in 48,926 tags. Of these, 1,361 were derived from duplicate ditags. Tags from duplicate ditags were not removed [19]. Twenty-one tags were removed as they contained ambiguous (N) nucleotides leaving 48,905 tags and 22,907 unique long SAGE tags for analysis.
Tag sequences and their absolute counts in 115 distinct "long" SAGE libraries were retrieved from cgap.nci.nih.gov/SAGE (98,944,923 tags). All "long" SAGE Libraries available before July 1, 2008 were included without selection. Tissues and cells represented include normal brain, breast, skin, pancreas, bladder, gallbladder, uterus, vein, testis, white blood cells, lung macrophages, embryonic stem cells as well as malignant tumors including colon and lung adenocarcinoma, melanoma, among others. The frequency of each tag (count/total tag number) was calculated for each of the 115 libraries and expressed as tags per million (TPM). The mean TPM for the non-glomerular libraries (Pool TPM) is reported. The frequency ratio of the glomerular: Pool TPM was then calculated to establish degree of enrichment of specific tags in glomeruli (Ratio G: P). Statistical comparison of the pooled libraries with the current glomerular SAGE library was based on Chi-square analysis using absolute tag counts [20]. Comparison to a human kidney SAGE library (SAGE_Kidney_normal_B_1, from cgap.nci.nih.gov/ SAGE) was based on the short (10 bp) tag sequences.
For each SAGE tag, identification was based on the "Hs_long.best_gene.gz" database found at ftp:// ftp1.nci.nih.gov/pub/SAGE/HUMAN/. The SAGE Genie algorithm for identifying the best gene match for SAGE tags was reported by Boon et al. [21]. For some tags, the Blast n algorithm at http://blast.ncbi.nlm.nih.gov/ Blast.cgi, was used to match tag sequences that could not be assigned by the "Hs_long.best_gene.gz" database. For these, the SAGE tag had to be in the +/+ orientation with the corresponding mRNA or EST, and fully match the 17 bp sequence immediately 3' of the NlaIII site nearest the Poly(A) + tail or a stretch of > 8 A's as previously reported [22]. Positive identification based on this latter search strategy is indicated in additional file 1 by asterisks.

RT-PCR Analysis
For quantitative RT-PCR, glomeruli were microdissected from distinct pre-transplant kidney biopsy specimens obtained from three separate donors aged 57, 59 and 63 at the University of Göteborg (Human Subjects Protocol #653-05). Immediately after biopsy, one half of one biopsy core was placed into 0.5 ml of ice-cold PBS containing 100 U RNAse inhibitor (RNAsin) (Applied Biosystems, Foster City, CA, USA). Four to fifteen glomeruli were isolated using a stereomicroscope (Zeiss, Jena, Germany) followed by extraction of total RNA. cDNA was generated from glomerular RNA with SuperScript™ III RT (Invitrogen, Carlsbad, CA, USA). Human kidney, spleen, lung and liver mRNA was purchased from Invitrogen/Ambion (Carlsbad, CA). Reactions without RT for each primer set served as controls. PCR cycling was performed with 100 ng template (94°C -3 min; 35 cycles: 94°C -30 sec; 55°C -30 sec; 68°C -30 sec plus 1 min for each kilobase pair (kbp) of PCR product to be amplified; 72°C -7 min). Quantification of gene expression was performed according to the delta Ct method (DeltaCt2/DeltaCt1), as described by others [23], and by this laboratory [22].

Human Glomerular SAGE Database Content
The complete human glomerular SAGE library was deposited in the Gene Expression Omnibus http:// www.ncbi.nlm.nih.gov/geo/ repository (record GSE8114, Accession # GSM199994) and in the SAGE Genie collection http://cgap.nci.nih.gov/SAGE as "LSAGE_Kidney_ Glomeruli_Normal_B_bjballer1". It consists of 22,907, unique 17 bp tag sequences and the absolute tag count for each sequence. The total tag count in the library is 48,905. The library is also appended in spreadsheet format with tag identification (additional file 1).

Retrieval of Highly Enriched Glomerular Transcripts
The transcripts most highly enriched in human glomeruli identified by this study are shown in Tables 1 and 2 and Additional files 3 and 4. Of the 22,907 tags, 291 were observed with an absolute count of 4 (81 TPM) or greater and enriched more than 10-fold relative to pooled non-kidney SAGE libraries. For 84 of these no reliable match to a known cDNA sequence was found, and a match to incompletely defined ESTs was observed for 8 others. The tags representing Aldolase B, uromodulin, glutamyl aminopeptidase, glutathione peroxidase, and SLC25A45 were excluded from this set because they were not enriched relative to whole kidney. They likely represent transcripts expressed at very high levels in contaminating tubules. Several highly expressed transcripts produced more than one unique tag, which is common and usually reflects priming from internal poly A (+) runs or alternatively spliced transcripts. After removal of such redundant tags, 133 wellcharacterized tags highly enriched in glomeruli were established (Tables 1 and 2 and Additional files 3 &4).
A previously published Sau3A-anchored SAGE library [9] prepared from microdissected human glomeruli contained 184 SAGE tags that were enriched in glomeruli rel- RT-PCR for PECAM-1 and the 67 kDa non-integrin laminin receptor LAMR1 for patient 1 (pt 1) and patient 2 (pt2) (G). Enrichment of the synaptopodin mRNA abundance, determined by RT-PCR, in glomeruli relative to whole kidney cortex from patient 2 (H).

Human Kidney Source Material
ative to other micro-dissected nephron segments. These represented 156 well-characterized transcripts. As expected, the corresponding NlaIII SAGE tag for 143 of these was also observed in the current glomerular SAGE library (Tables 1 and 2 and Additional files 3 and 4 and additional file 2). For 47 transcripts represented in both libraries a 10-fold or greater enrichment of the NlaIII tag relative to non-glomerular cells and tissues was observed and is shown in Tables 1 and 2 and Additional files 3 and 4. The NlaIII tag corresponding to the remaining 96 transcripts identified in the Sau3A library was enriched relative to whole kidney, in keeping with the previous report [9], but less than 10 fold relative to non-renal tissues (additional file 2).
Many of the highly expressed and highly enriched transcripts observed in this library are encoded by genes already known to be unique or highly enriched in glomerular podocytes, for instance Podocin (NPHS2), Nephrin (NPHS1), transcription factor 21 (Pod1, FLJ35700), Protein Tyrosine Phosphatase Glepp 1 (PTPRO), Synaptopodin (SYNPO), indicating that this SAGE database appropriately represents glomerular transcripts and that it identifies transcripts enriched in glomeruli. Some of the SAGE tags enriched in glomeruli represent known endothelial cell-predominant genes, for instance Endomucin (EMCN), claudin 5 (CLDN5), NOSTRIN and CD34, consistent with abundant EC in glomeruli.
To independently demonstrate the utility of this database in defining enrichment of transcripts in glomeruli, RT-PCR comparing the level of expression of 19 transcripts enriched in the glomerular SAGE library with that in lung, spleen and liver was performed. Lung, liver and spleen were not represented in the pooled SAGE libraries used here. For each, glomeruli microdissected from the kidneys of three distinct donors were used. The source mRNA used for RT-PCR was distinct from that used for generation of the SAGE library. Transcripts were chosen to represent a spectrum of glomerular enrichment, and some wellknown podocyte-predominant transcripts (TCF21, VEGFA) were included as internal controls. Overall, the degree of glomerular transcript enrichment observed by RT-PCR compared to lung, liver and spleen was similar to that observed by SAGE, though there was variation between lung, spleen and liver ( Table 3). The wide range of expression observed in the three non-glomerular tissues was expected, as the pooled SAGE-based comparison does not take into account tissue-to-tissue variation in gene expression.
Finally, it is of note that 117 transcript tags observed 2 or more times and enriched > 500 fold in this glomerular library remain unidentified or poorly characterized (additional file 1). At least some of these will likely prove to be currently unknown glomerulus-predominant transcripts.

In Silico Interrogation of the Glomerular SAGE Database
The current database can be retrieved directly or interrogated in silico. It may be used to determine whether any specific gene is highly expressed in glomeruli, and to define transcripts that are highly enriched relative to other tissues for which SAGE libraries are available.
To assess whether a specific transcript is expressed in glomeruli, the SAGE tags uniquely identifying the transcript can be found at http://cgap.nci.nih.gov/SAGE/ using the "SAGE Anatomic Viewer" [21]. The "Digital Northern" tool is then used to evaluate the level of expression in the SAGE libraries of the collection, which includes the current library. The collection can also be interrogated using specific NlaIII SAGE tags of cDNA sequences for which a gene symbol may not yet have been assigned. The tag can be retrieved from any cDNA sequence by identifying the 17-nt sequence immediately 3' of the last NlaIII site (CATG) prior to the poly(A + ) tail. Its frequency in the glomerular database is an indicator of the level of expression in human glomeruli. The 95% confidence interval for observing any tag with a true count of 4 is ± 3.96. Hence, any transcript producing a tag frequency of 4 per 48,905 (81.8 TPM) or greater has a 95% probability of being represented in this library. Failure to find the SAGE tag representing any specific transcript in this library indicates that its expression level is lower than the limit of detection, or that the transcript does not contain an NlaIII restriction site from which a SAGE tag could be generated.
The "LSAGE_Kidney_Glomeruli_Normal_B_bjballer1" database can also be compared directly to a single, or sets of other SAGE databases in the SAGE Genie collection using the "SAGE Digital Gene Expression Displayer (DGED)" tool at http://cgap.nci.nih.gov/SAGE/. This type of analysis will return data similar to those in additional file 1, though comparison can also be restricted to specific libraries rather than the pool of libraries evaluated here.
Finally, this SAGE library with matching transcript identification, glomerulus to pool ratio and glomerulus to kidney ratio is supplied as additional file 1, where the order is based on tag abundance. This data set contains only 18,152 SAGE tags, as any tag found only once and not in any other library was removed. The table can be retrieved without restriction and, if desired, sorted based on the degree of tag enrichment.

Discussion
This study established a human glomerular SAGE library that can be used for data mining by investigators with an interest in glomerular cell biology and pathophysiology. The library was appropriately enriched in SAGE tags representing transcripts known to be restricted to glomerular podocytes, including nephrin [24], podocin [25], synap-      topodin [26], podocalyxin [27], transcription factor 21 [28], the protein tyrosine phosphatase receptor type O GLEPP1 [29], the cyclin dependent kinase inhibitor C1 [30] and nestin [31]. It is therefore likely that other transcripts whose SAGE tags are much more highly represented in this library compared to SAGE libraries from other tissues and cells are also expressed predominantly in glomeruli.
A SAGE library that used Sau3A as the anchoring restriction enzyme was previously produced from human glomerular mRNA [9]. It identified 155 highly expressed transcripts in glomeruli that were enriched in glomeruli when compared to microdissected non-glomerular nephron segments. Since the previously published glomerular SAGE library is based on the Sau3A anchoring restriction site, it does not allow in silico comparison of tag frequencies with the much greater collection of NlaIII-based SAGE libraries. All except 12 transcripts reported to be enriched in glomeruli by Chabardes-Garonne [9] were observed in the current glomerular SAGE library. The corresponding NlaIII tag for a subset of these (47 tags) was enriched > 10 fold when compared to non-renal tissues and cells (Tables 1 and 2 and Additional files 3 &4), providing independent evidence that these represent glomerulus-predominant transcripts.
The current study also identified 86 transcript tags that were enriched more than 10 fold in glomeruli, but which were not represented in the previous Sau3A anchored The SAGE tag frequency in glomeruli (G: TPM) and SAGE library pool (P: TPM) as well as the relative SAGE tag enrichment in glomeruli (Ratio G: P) is shown. The transcript abundance relative to lung (G: Lu), spleen (G: Sp) and liver (G: Li) was determined by RT-PCR using distinct sets of microdissected glomeruli from kidneys of three different donors. G: glomeruli, Lu: lung, Sp: spleen, Li: liver. Mean ± SEM.
library (Tables 1 and 2 and Additional files 3 &4). Failure to find a Sau3A SAGE tag for known glomerulus-restricted genes like nephrin, or an NlaIII SAGE tag for endoglin and VCAM1, suggests either that the tag frequency was too low to be detected or that the required restriction site was absent from the transcript. The current study also shows that that several transcripts more highly expressed in glomeruli compared to other nephron segments [9] are not restricted to glomeruli when compared to non-renal tissues or cells (additional file 2). This is not surprising since some transcripts that are not shared between nephron epithelium and glomerular capillary tuft nevertheless may be highly expressed in other tissues.
Several transcripts not previously shown to have a specific function in glomeruli were highly expressed and enriched in glomeruli when compared to non-glomerular tissues. Among these, the tag for the chloride intracellular channel 5 (CLIC5) is very abundant in the glomerular transcript pool, and its frequency in glomeruli was more than 800 fold greater than in other tissues. The transcript "DKFZp564B076" whose SAGE tag was previously shown to be enriched in microdissected glomeruli [9] and later in cultured glomerular EC in this laboratory [22] is identical to the 3' end of CLIC5. CLIC5 is an ezrin-binding protein involved in maintaining actin-based microvilli in the placenta and actin-based stereocilia in the inner ear [32]. Its role in glomerular cell function is as yet undefined. The transcript for the basal cell adhesion molecule (BCAM) is also very abundant in glomeruli and enriched approximately 58 fold. BCAM is a glycoprotein that functions as a receptor for alpha5 laminin. BCAM immunoreactivity is observed in both, glomerular podocytes and glomerular EC, and mice deficient in BCAM have significant structural abnormalities of glomeruli [33]. Glomerular expression of the parathyroid hormone receptor 1 (PTHR1) was not expected. PTHR1 is very abundant in renal proximal tubule cells and could therefore represent proximal tubule contamination. However, since the PTHR1 SAGE tag was less abundant in renal cortex than in glomeruli (Table 1 and Additional file 3), its enrichment in this library cannot be due to proximal tubule contamination. Indeed, mesangial cells express PTHR1 [34]. More work is required to define the function of PTHR1 in mesangial cells. In this regard, it is of great interest that Sclerostin, an inhibitor of bone matrix formation whose expression is regulated by PTH, is also expressed at much higher levels in glomeruli than in most non-renal tissues and cells (Table 1 and Additional file 3) or in other nephron segments [9]. While we have no comparison with a bone SAGE library where sclerostin is likely expressed at high levels, the finding nonetheless suggests that it could be involved in regulating extracellular matrix depositon in glomeruli. Nephronectin, a ligand for integrin alpha8beta1 is known to be essential for renal develop-ment, and is expressed in renal epithelium. Enrichment of the nephronectin SAGE tag in the glomerular library relative to kidney cortex is in keeping with the observation by Brandenberger et al [35], who observed very strong nephronectin immunoreactivity in differentiating glomeruli. The secreted glycoprotein testican 2 SPARC (SPOCK2) belongs to in the osteonectin/SPARC family [36] is also highly expressed and enriched in glomeruli. Members of this family of proteins regulate cell-cell and cell-matrix interactions, and SPOCK2 is induced after glomerular injury [37]. The other protein in this family is connective tissue growth factor (CTGF). The SAGE tag for CTGF was observed at a high frequency in glomeruli (additional file 1) but it was not highly enriched relative to other tissues. Nonetheless, both SPOCK2 and CTGF likely play a critical role in regulating glomerular remodeling. In 2006 Lakhe-Reddy and coworkers [38] described the localization of beta 8 integrin to glomerular mesangial cells and observed that its expression may suppress mesangial cell dedifferentiation via Rac1 activation. The SAGE tag for integrin beta 8 was highly expressed and enriched in this glomerular library.
While several semaphorins are expressed in renal glomeruli, so far a role for semaphorin 3G, whose SAGE tag is abundant and enriched in this database, has not been described. Still, semphorin 3G, which has repulsive function via neuropilin 2 binding in the CSN neuronal guidance, is also highly expressed in kidney [39], begging the question whether it serves an important function is in glomeruli. Based on this study many other transcripts are highly enriched in glomeruli. It is hoped that other investigators will use this database as a tool to further define the transcriptome of glomerular cells in health and disease.
We did not observe the NlaIII SAGE tag for EHD3, a transcript previously shown to be unique for glomerular endothelial cells [12], in this library. A SAGE tag for EHD3 also is not observed in the previously published Sau3Aanchored library [9]. Failure to observe this tag does not detract from the previous observations but only suggests that the EHD3 transcript abundance was too low to generate a SAGE tag in the two glomerular SAGE libraries.
Finally, not all tags observed in this SAGE library have as yet been matched to a specific gene. For some of these unidentified SAGE tags, matching sequences within the human genome are observed, but whether they represent specific transcripts is currently not known.

Conclusion
We have constructed a new human glomerular SAGE library, based on the NlaIII anchoring restriction site. The database can be searched to determine whether specific transcripts are highly expressed and/or enriched in glomeruli and it can be used a resource to further study transcripts that appear to be glomerulus-enriched but whose function in glomeruli has not been investigated so far.

Availability and requirements
The SAGE database (GEO Accession #GSM199994) described here is available for download from http:// www.ncbi.nlm.nih.gov/geo/. It can also be downloaded from, or interrogated in silico at http://cgap.nci.nih.gov/ SAGE/ without restriction. The annotated database containing Tag sequences, glomerular frequencies, gene identification, as well as frequency ratios to pooled and kidney libraries is available as additional file 1.