- Long-range sequence analysis in Xq28: thirteen known and six candidate genes in 219.4 kb of high GC DNA between the RCP/GCP and G6PD loci.
Long-range sequence analysis in Xq28: thirteen known and six candidate genes in 219.4 kb of high GC DNA between the RCP/GCP and G6PD loci.
DNA comprising 219 447 bp was sequenced in nine cosmids and verified at > 99.9% precision. Of the standard repetitive elements, 187 Alus make up 20.6% of the sequence, but there were only 27 MERs (2.9%) and 17 L1 fragments (1.6%). This may be characteristic of such high GC (57%) regions. The sequence also includes an 11.3 kb tract duplicated with 99.2% identity at a distance of 38 kb. The region is 80-90% transcribed and 12.5% translated. Thirteen known genes and their exon-intron borders are all accurately predicted at least in part by GRAIL programs, as are six additional genes. From centromere to telomere, the orientation of transcription varies among the first eight genes, then runs centromeric to telomeric for the next five, and is in the opposite sense for the last six. Eighteen of the 19 genes are associated with CpG islands. Two islands are exact copies in the 11.3 kb repeat units, and could thus give rise to double dosage levels of an X-linked gene. Another island is associated with two genes transcribed in opposite directions. From the sequence data, three genes and their exon structure are inferred. One of them, previously associated with HEX2, is shown to be a different gene unrelated to hexokinases; a second gene, previously known by an EST, is plexin, from its 65.5% identity with the Xenopus analog; and a third is a subunit of a vacuolar H-ATPase, and is named VATPS1.