The genetic bases of skeletogenesis are expected to shed light on the origins of metazoan biomineralization. Here we review aspects of genetic machineries of invertebrate skeletogenesis, including regulatory genes involved in biomineralization as well, and with an enumerative reference to the genes encoding skeletal matrix proteins. The complete primary structure has been determined for a total of 77 skeletal matrix proteins in invertebrates representing five animal phyla. Presence of repeated sequences and prevalence of acidic proteins stand as common features among those proteins. Similarities are interpreted as convergence because these proteins are not similar at the primary structure level. C-type lectin-like domains are shared by the calcium carbonate skeletal matrix proteins of molluscs and deuterostomes. However, the important sites for carbohydrate binding are not conserved between these two groups. Several arthropod skeletal matrix proteins have the Rebers-Riddiford consensus sequence which is characteristic of non-calcified cuticular proteins of arthropods, indicating that these skeletal matrix proteins were recruited from the non-calcified cuticular proteins after arthropods diverged from other metazoan groups. Dermatopontin, a molluscan shell matrix protein, is also inferred to represent a cooption for biomineralization after molluscs diverged from other metazoan groups based on the molecular phylogemetic analysis. Those findings support the premise that the genetic machineries of biomineralization evolved independently many times after the divergence of metazoan phyla, and that some common genes that served for other functions have been coopted for biomineralization in various lineages.
Introduction
After the appearance of organisms which produced distinctly larger trace fossils toward the end of the Neoproterozoic and before the emergence of the relatively abundant fauna producing small shelly fossils that marks the beginning of the Cambrian explosion, metazoans variously adapted to a range of distinctive habitats and evolved to exhibit many disparate body-plans (Conway Morris, 1998; Valentine, 2004). The occurrence of massive biominerals in metazoans may be one of the many novelties of their evolution which appeared in that period. Nevertheless, the apparent absence of biominerals in the Ediacaran fauna recognized at the close of the Neoproterozoic and nearly simultaneous occurrence of biominerals in cyanobacteria (in spite of some Proterozoic cyanobacteria having had mineralized sheaths), algae, foraminifers and radiolarians (Runnegar and Bengtson, 1990) seem to call for specific explanations for metazoan biomineralization, and many researchers have tried to explain it. The explanations chiefly concern physicochemical environmental factors (e.g., O2, CO2, Ca, P, temperature), ecological relationships (e.g., predation, primary production) and other adaptive advantages (e.g., biomechanics, detoxification) (Simkiss, 1977; Vermeij, 1989; Runnegar and Bengtson, 1990; Bengtson, 1994; Nedin, 1999; Valentine, 2004; Cohen, 2005). While it is logical for us to search for triggers that brought about the simultaneous occurrences of biominerals, the mechanisms that enabled biomineral formation, or the underlying machineries of biomineralization, including its genetic bases, would also be important and must shed light on this problem.
So far as metazoans, our knowledge of biomineralization mechanisms in invertebrates has been limited compared to that in vertebrates. Thanks to the propagation of molecular biological techniques, however, data from invertebrates have been accumulated in recent years. It is thus probably worthwhile to consider the origins of invertebrate biominerals through the aspect of mechanisms at the molecular level. Initially we refer to the finding that implanted molluscan nacres induce osteogenesis in vertebrates. Then we review the gene regulatory networks involved in biomineralization, including vertebrates for comparison. Finally, we review matrix proteins in invertebrate skeletons which have been thought to play very important roles in biomineralization. We enumerate the invertebrate skeletal matrix proteins of which the complete primary structure has been clarified to date, and discuss the underlying mechanisms and the origins of invertebrate biomineralizations, referring also to osteogenic activity of molluscan nacre and the gene regulatory networks.
Abbreviations used in this article
BMP: bone morphogenetic proteins, cDNA: complementary DNA, CRD: carbohydrate recognition domain, CTLD: C-type lectin-like domains, ELISA: enzyme linked immunosorbent assay, EST: expressed sequence tag, IGFBP: insulin-like growth factor binding protein, PMC: primary mesenchyme cell, SDS-PAGE: sodium dodecyl sulfate-polyacrylamide gel electrophoresis.
Three-letter and one-letter codes of the common amino acids
Ala (A): alanine, Arg (R): arginine, Asn (N): asparagines, Asp (D): aspartic acid, Cys (C): cysteine, Gln (Q): glutamine, Glu (E): glutamic acid, Gly (G): glycine, His (H): histidine, Ile (I): isoleucine, Leu (L): leucine, Lys (K): lysine, Met (M): methionine, Phe (F): phenylalanine, Pro (P): proline, Ser (S): serine, Thr (T): threonine, Trp (W): tryptophan, Tyr (Y): tyrosine, Val (V): valine.
Osteogenic activity of molluscan nacre
Bobbio (1972) exhumed Mayan skulls in Honduras. He discovered that the ancient Mayans used nacre as a dental implant, which showed perfect osteointegration. Thus Bobbio rediscovered an osteogenic property of nacre. Atlan et al. (1997) took nacre, mother-of-pearl, from shells of the giant oyster Pinctada maxima, and implanted powdered-nacre into human bones, aiming at reducing bone loss. The implanted nacre was accepted showing no immunological response. Moreover, osteoblasts, bone-forming cells, were activated, and new bone was formed. The osteogenic activity of nacre obtained from P. maxima has also been shown by in vivo experiments in rats (Liao et al., 1997), in sheep (Delattre et al., 1997; Lamghari et al., 1999 Lamghari et al., 2001b; Milet et al., 2004) and in rabbits (Lamghari et al., 2001a). When Dupoirieux et al. (1994) implanted coral skeletons into rat bones, half were accepted, but a complete bone repair was not achieved. This observation suggests that coral skeletons have weaker osteogenic activity than molluscan nacre. This may reflect evolutionary relationships among them, because mammals are evolutionarily more distantly related to corals than to molluscs (Westbroek and Marin, 1998). In vitro experiments using nacre chips of P. maxima and human osteoblasts suggested osteogenic activity of nacre. Induction of mineralization occurred preferentially in the osteoblasts attached to nacre chips (Lopez et al., 1992; Silve et al., 1992).
In order to confirm the hypothesis that the nacre contains diffusible signal molecules which are able to stimulate mammalian cells involved in bone formation, in vitro experiments using water-soluble matrix molecules from P. maxima nacre and mammalian cell cultures have been performed. These experiments showed that the water-soluble matrix is also effective in inducing differentiation or activation of rat osteoblasts (Mouriès et al., 2002), mouse pre-osteoblasts (Milet et al., 2004), human fibroblasts, the cells inducing bone formation (Almeida et al., 2000, 2001; Mouriès et al., 2002; Milet et al., 2004), and rat bone marrow stromal cells, a pool of bone precursor cells (Lamghari et al., 1999; Mouriès et al., 2002; Milet et al., 2004). The water-soluble matrix from P. maxima nacre was also revealed to increase the amount of Bcl-2 in rat osteoblasts, a protein which inhibits programmed cell death (Korsmeyer, 1992), consequently prolonging the life of mature osteoblasts (Mouriès et al., 2002). These observations in all implied that, although the bone and the nacre from P. maxima are not homologous morphologically, some parts of the complex machinery involved in their formation may be homologous (Westbroek and Marin, 1998).
The nacre from the Swedish freshwater pearl mussel Margaritifera sp. also promoted rat bone formation in vivo. In this case, however, no active osteoblasts were detected in direct contact with nacre at early stages. Therefore, the osteogenic activity of Margaritifera nacre could not be confirmed (Liao et al., 2000, 2002).
Gene regulatory networks involved in metazoan biomineralization
Vertebrates
Gene regulatory mechanisms involved in biomineralization have been best studied in vertebrates. Osteoblast differentiation and bone formation are known to be regulated by many local factors. Among these, bone morphogenetic proteins (BMPs) are one of the most potent factors. The bone-inducing ability of BMPs was first reported in the 1960s (Urist, 1965; Urist and Strates, 1971), but it was very difficult to isolate BMPs as a single protein from the decalcified bone matrix until the late 1980s. Molecular cloning of human BMP-2 and -4, and the purification and sequencing of bovine BMP-3 were performed in the late 1980s (Wozney et al., 1988; Luyten et al., 1989; Wozney, 1992). Consequently it has been clarified that BMPs are signal molecules belonging to the transforming growth factor β (TGF-β) superfamily, and play critical roles in bone induction. To date, around 20 BMP members have been identified (Shen et al., 2004) and BMP-3 was revealed to be an antagonist to other osteogenic BMPs (Daluiski et al., 2001). Three signal molecules, Smad1 (Hoodless et al., 1996), 5 (Nishimura et al., 1998) and 8 (Chen et al., 1997) are revealed to be the immediate downstream molecules of BMP receptors and play a central role in BMP signal transduction (Chen et al., 2004), and other signal molecules, Sonic and Indian hedgehogs are involved in osteoblast differentiation by interacting with BMPs (Yamaguchi et al., 2000). Some transcription factors such as core-binding factor α-1 (Cbfa1), also known as Aml3 or Runx2, Osterix (Osx), which is a downstream molecule of Cbfa1, and the Activator protein-1 (AP-1) family including Fra-1 and ΔFosB are also known to play a major role in the differentiation and maturation of osteoblasts. Cbfa1 or Osx null mice showed a complete lack of ossification (Komori et al., 1997; Nakashima et al., 2002), and overexpression of Fra-1 and ΔFosB increased bone mass (Jochum et al., 2000; Sabatakos et al., 2000). The three transcription factors having a paired-type homeodomain, Cart1 (Zao et al., 1994), Alx3 (ten Berge et al., 1998), and Alx4 (Qu et al., 1997), regulate skeletons of the face, neck and limbs (Beverdam and Meijlink, 2001). Although Alx3 single mutant mice did not show obvious abnormalities, Alx3/Alx4 double mutant mice showed more severe craniofacial abnormalities than Alx4 single mutant mice (Beverdam et al., 2001). Cart1/Alx4 double mutant mice exhibited severe craniofacial and limb abnormalities (Qu et al., 1999). In humans, mutations in Alx4 caused skull ossification defects (Wu et al., 2000; Mavrogiannis et al., 2001). Furthermore, two signaling molecule, parathyroid hormone related peptide (PTHrP) and fibroblast growth factors (FGFs) are known to be critically involved in endochondrial bone formation, and three transcription factors of the Sox family, Sox9, L-Sox5 and Sox6 have essential roles in chondrocyte differentiation (de Crombrugghe et al., 2001).
Invertebrates
As to gene regulatory mechanisms involved in biomineralization in invertebrates, several components of the gene networks and their regulatory relationships have been revealed to some extent. The identified components or the component candidates of the gene networks were transcription factors such as engrailed (Moshel et al., 1998; Jacobs et al., 2000; Wanninger and Haszprunar, 2001; Nederbragt et al., 2002), Hox1, Hox4 (Hinman et al., 2003), and signal molecules such as decapentaplegic (Nederbragt et al., 2002) in molluscs, and transcription factors such as engrailed (Lowe and Wray, 1997), ets1 (Kurokawa et al., 1999), Alx1 (Ettensohn et al., 2003), and the possibly transmembrane protein P16 (Cheers et al., 2005) in echinoderms. SM50, one of the skeletal matrix protein genes of sea urchins, is located directly downstream of ets1 (Kurokawa et al., 1999). P16 is downstream of Alx1 (Cheers et al., 2005) in the primary mesenchyme cell (PMC) gene regulatory network in sea urchins (Cheers et al., 2005). In the micromere PMC gene regulatory networks in sea urchins, genes upstream of Alx1 are Repressor X and Pmar1 (Ettensohn et al., 2003).
The gene regulatory networks in some metazoan phyla have the similar components in common, but it may not necessarily support evolutionary links of the biomineralization systems between them. engrailed (en) is considered to be at least indirectly involved in skeletogenesis in both molluscs and echinoderms. en is expressed at the boundary around the embryonic shell of molluscs (Nederbragt et al., 2002) and at the boundaries between newly forming skeletal ossicles in brittle stars (Lowe and Wray, 1997). en expression patterns in brittle stars are not a common feature among echinoderms and thus en may have been recruited for skeletogenic function within echinoderms (Lowe and Wray, 1997) independently from molluscs. decapentaplegic (dpp) is an ortholog of BMP2/4 which is also involved in osteogenesis in vertebrates. dpp-BMP2/4 was discovered from the mantle of the mollusc Pinctada fucata (Matsushiro and Miyashita, 2004). In the case of another mollusc Patella vulgata, however, Dpp-BMP2/4 is probably employed for setting up the boundary around shells rather than inducing shell formation (Nederbragt et al., 2002).
Members of the Cart1/Alx3/Alx4 subfamily are involved in skeletogenesis in both echinoderms (Alx1) and vertebrates (Cart1, Alx3, Alx4), possibly indicating the common origin of a part of their biomineralization systems. Most of the components, however, are different between echinoderms and vertebrates, and common features cannot be found between the regulatory relationships of the genes involved in their biomineralization. These arguments seem to harmonize with the independent origins of biomineralization in many metazoan phyla, and we could perhaps explain the common employment of Alx subfamily members in echinoderms and vertebrates as independent gain-of-function events. Except for vertebrates and echinoderms, however, there is very little information on the regulatory gene cascades involved in metazoan biomineralization, and it is premature to draw definitive conclusions.
Skeletal matrix proteins in invertebrates
Porifera
Silica is an amorphous form of hydrated SiO2, an abundant element in the earth's crust. There are three classes of sponges, two of which, the Hexactinellida and the Demospongiae, produce silicified spicules, while the other one, the Calcarea, produces calcium carbonate spicules. Incidentally, the Calcarea is considered closer not to either or both the Hexactinellida and the Demospongiae but to the other metazoan phyla (Kruse et al., 1998). These spicules, supporting the organisms and defending them against predation, are produced in membrane-enclosed vesicles in specialized cells, sclerocytes (Perry, 2003). After reaching a crucial size, the spicules are extruded from the cells and their growth proceeds in the extracellular space (Müller et al., 2005).
Shimizu et al. (1998) isolated three very similar proteins of 27, 28 and 29 kDa in size from the spicules of Tethya aurantia belonging to the Demospongiae. They dubbed the three proteins, Silicatein α, Silicatein β and Silicatein γ, respectively, and the complete primary structure of Silicatein α (and Silicatein β later on) was deduced from the full-length cDNA. Silicatein α is rich in serine, glycine, alanine and tyrosine, and has a high degree of sequence similarity to cathepsin L, a papain-like cysteine protease. The catalytic cysteine residues, however, at the active site of the cystein protease is replaced by serine in Silicatein α, and Silicateins did not display esterase activity when tested with synthetic chromogenic substrates (Shimizu et al. 1998). Recombinant Silicatein α, however, catalyzed the polymerization of silica at moderate temperature and pH in vitro (Cha et al., 1999), suggesting that Silicatein α has an enzymatic function.
The axial filaments in the spicules of the common Mediterranean sponge Petrosia ficiformis (Demospongiae) contained two water-insoluble proteins of 30 and 23 kDa (Pozzolini et al., 2004). They showed this 23 kDa protein to be a homolog of Silicateins (Funayama et al. (2005) called it “Pf Silicatein”). Another homolog of Silicateins (Ef Silicatein) was also identified from the EST library of a freshwater sponge Ephydatia fluviatilis belonging to Demospongiae (Funayama et al., 2005). Recently a cluster of four genes consisting of ankyrin repeat gene, silicatein β (encoding SILICAb_SUBDO), tumor necrosis factor receptor-associated factor and a protein kinase was isolated from the S. domuncula genome, and the four genes including silicatein β were revealed to be expressed synchronously (Schröder et al., 2005). This observation suggests that a coordinated expression of physically linked genes is essential for the synthesis of the spicules.
A silicase identified from S. domuncula is known to be a silica-catabolizing enzyme (Schröder et al., 2003), which shares the highest sequence similarity to the family of carbonic anhydrases. Silicase is able to degrade silica to form free silicic acid, and possibly contribute to the metabolism of silica deposition.
While Krasko et al. (2000) found that silicates up-regulate SILICAa_SUBDO which is an S. domuncula ortholog of Silicatein-α, Müller et al. (2005) showed that selenium, a trace element essential for metazoans, up-regulates two other genes, and increased spicule formation using an in vitro cell culture system of S. domuncula belonging to the Demospongiae.
One of these two genes showed the closest similarity to the human selenoprotein M. The biological role of selenoprotein M in higher metazoan phyla is not known in detail, but it has previously been shown that selenocysteins, a 21st amino acid in ribosome-mediated protein synthesis (Stadtman, 1996), in selenoproteins participates in redox reactions, together with a close cystein partner (Hatfield and Gladyshev, 2002). This constellation of selenocystein and cystein residues also exist in S. domuncula selenoprotein M, suggesting that this protein functions as an enzyme (Müller et al., 2005). Another up-regulated protein termed spicule-associated protein showed no significant sequence similarity to any protein in the database. This protein has six highly similar repeats comprising 20 amino acid residues, in which hydrophobic amino acids are surrounded by polar amino acids, expected to form a tight association with membranes. The immunohistological data showed that selenoprotein M, spicule-associated protein and Silicatein are all present in the axial filaments and at the surface of the spicules, suggesting that they are associated with spicule formation (Müller et al., 2005).
Collagen is the major structural protein in general. In sponges, spicules are embedded in an organic matrix containing collagen (Müller et al., 2005). Some collagen-like proteins are identified in sponges (Exposito et al., 1991; Krasko et al., 2000), and one of them, COL1_SUBDO identified from S. domuncula (Schröder et al., 2000), was revealed to be up-regulated by silicates as well as by Silicateins. Thus COL1_SUBDO is likely to be involved in the formation of the collagenous sheath required for the functional skeletons (Krasko et al., 2000).
Cnidaria
Reef corals belonging to the Scleractinia accrete hard skeletons leading to biomineralization products of spectacular size, while soft corals belonging to the Octocorallia and the Antipatheria have spicules supporting the organisms (Cohen and McConnaughey, 2003). Scleractinian skeletons are nearly exclusively composed of aragonite, and spicules of octocorals are composed of calcite (Lowenstam and Weiner, 1989). The spicules contain a proteinaceous axial filament.
Galaxin was identified from the soluble organic matrix in the exoskeleton of the scleractinian coral Galaxea fascicularis, which is the only coral skeletal matrix protein of which the complete primary structure has been elucidated (Fukuda et al., 2003). Galaxin transcripts were detected in the adult coral, but not in planktonic larvae. Galaxin has two potential N-linked glycosylation sites and a tandem repeat structure where a sequence of 27–31 amino acid residues repeats ten times. Each repeat unit contains a dicystein sequence. An in vitro study using a synthetic peptide suggests that Cys forms an intrarepeat cyclocystine loop (Liff and Zimmerman, 1998) and two terminal Cys may contribute to cross-linking to other molecules. The abundance of cystein residues raises the possibility that the protein is highly cross-linked to form a macromolecular network. Galaxin has no acidic domain and did not exhibit a significant Ca2+-binding ability when examined using the Ca2+-overlay analysis (Maruyama et al., 1984; Fukuda et al., 2003). These observations, however, do not indicate that Ca2+-binding protein is not important nor present in the coral biomineralization systems (Isa and Okazaki, 1987). Partial amino acid sequences containing many aspartic acids have been reported from both exoskeletons of a scleractinian coral (Puverel et al., 2005) and spicules of an octocorallian coral (Rahman and Isa, 2005). Ca2+-binding proteins were also detected from hard tissues of both scleractinian and octocorallian corals (Watanabe et al., 2003; Rahman and Isa, 2005). On the other hand, Ca2+-binding phospholipids were detected in scleractinian corals, and they might be important for scleractinian biomineralization, such as serving as seeding sites for calcium carbonate depositions (Isa and Okazaki, 1987).
Arthropoda
Most crustaceans possess a calcified outer skeleton, the carapace, which they replace completely and periodically with moulting cycles to permit growth. Thus they need a large amount of calcium during the pre-moult period. They have developed several calcium storage strategies, and during the period they absorb calcium from diverse storage structures such as gastroliths or midgut caeca as well as from food or water (Lowenstam and Weiner, 1989; Testenière et al., 2002). In crayfish, exoskeleton and gastrolith consist of calcite and amorphous calcium carbonate, respectively (Takagi et al., 2000). In general, exoskeletons of crustaceans consist of calcium carbonate under a crystalline state and/or an amorphous form, whereas calcium storage structures are always in an amorphous form (Luquet and Marin, 2004).
Andersen and coworkers extracted a soluble matrix protein from the calcified exoskeleton of the northern red shrimp Pandalus borealis and determined its complete amino acid sequence by combined mass spectrometry and Edman degradation (Jacobsen et al., 1994). This is the first report of the complete primary structure of a matrix protein identified from crustacean hard tissues. This protein, H1a, is rich in hydrophobic amino acids such as proline, alanine and valine (Jacobsen et al., 1994). On the other hand, Suzuki et al. (2001) extracted an insoluble matrix protein, Pb CP-12.7, from the same origin, the calcified exoskeleton of Pandalus borealis, and its complete primary structure was determined by direct sequencing. The sequence is rich in hydrophobic amino acids, and includes a repeated motif, X-A-G-X-X-P-Y. This protein is insoluble in water but soluble in methanol, and shows chitin-binding ability using chromatography with a chitin-packed column. Because P. borealis is a small shrimp and a batch of 50–100 shrimps was needed to extract proteins, they then studied bigger decapods, the American lobster Homarus americanus (Kragh et al., 1997; Nousiainen et al., 1998) and the rock crab Cancer pagurus (Andersen, 1999). They extracted 13 matrix proteins from the calcified exoskeleton of H. americanus; these proteins are numbered according to their molecular mass in kilodaltons and the number is preceded by the letters HaCP (H. americanus cuticular protein). Similarly, the 12 matrix proteins identified from the calcified exoskeleton of C. pagurus are indicated by CpCP (C. pagurus cuticular protein), and the five matrix proteins from the arthrodial membranes, flexible regions of the exoskeleton, are indicated by CpAMP (C. pagurus arthrodial membrane protein). A common 18 amino acid motif contains three highly conserved glycine residues shared by HaCP11.6, CpCP12.46, CpCP12.43, CpCP11.58 (these four proteins contain four copies of the motif), HaCP6.3, HaCP5.9, HaCP5.6, HaCP4.6, HaCP4.5, HaCP4.4, CpCP4.98, CpCP4.66, CpCP4.63, CpCP4.59 and CpCP4.34 (these 11 proteins contain two copies). The fact that this 18 amino acid motif is found in 15 out of the 25 proteins derived from the calcified exoskeletons examined, and that this motif is not found in any other proteins derived from the arthrodial membranes examined nor in any of the cuticular proteins of insects suggest that it is involved in the calcification process (Andersen, 1999). The Rebers-Riddiford consensus sequence, which is a chitin-binding domain and found in many cuticular proteins of insects and crustaceans, is present in CpCP11.14, and a truncated version of the consensus sequence is found in HaCP18.8 and CpCP5.75 (Nousiainen et al., 1998; Andersen, 1999). The fact that the consensus sequence is found only in three out of the 25 proteins in the calcified exoskeleton examined and that the consensus sequence is mainly found in proteins derived from flexible cuticles (Andersen, 1999) suggests that the consensus sequence is not likely to be involved in the calcification process.
Orchestin is a water-soluble non-glycosylated acidic matrix protein extracted from calcium carbonate concretions in the midgut caeca of the land crustacean Orchestia cavimana (Testenière et al., 2002). The amino acid sequences are rich in acidic residues such as as-partic acid (16.7%) and glutamic acid (13.0%), and several tandem and periodic repeats such as E-S-R/E-E-E-P-R-K-L, D-D-S-R-E, S-D-E-S, S-D-E, S-R-E, S-D and D-S are recognized (Testenière et al., 2002). Orchestin was shown to have a calcium-binding ability using the Ca2+-overlay analysis. The Ca2+-binding occurred only via the phosphoserine residues, although tyrosine residues are also phosphorylated (Hecker et al., 2003). Both the transcripts and translation products are expressed in the calcium storage organ during the premoult periods (Testenière et al., 2002). The protein is synthesized also during the postmoult period as a component of the organic matrix of the calcium spherules, by which calcium resorbed from the storage organ is translocated (Hecker et al., 2004). Thus Orchestin is probably a key molecule in two transitory calcified mineralization processes in the calcium storage organ and the calcium spherules.
GAMP (Gastrolith Matrix Protein) is an insoluble matrix protein extracted from gastroliths of the cray-fish Procambarus clarkii (Ishii et al., 1996, 1998). The amino acid sequence is rich in alanine and glycine residues and has no potential N-linked glycosylation site. The sequence includes a repeated motif, Q-Q-A-A-P-A/T, and shows some sequence similarity to involucrin, a human keratinocyte protein, which is cross-linked to the membrane proteins (Tsutsui et al., 1999). GAMP was revealed to inhibit calcium carbonate precipitation in in vitro experiments, by monitoring the pH values of supersaturated calcium carbonate solutions, but Ca2+-overlay analysis indicated that GAMP has almost no Ca2+-binding ability. The transcripts are expressed in the gastrolith disc, in which a pair of gastroliths is formed, during the premoult period (Tsutsui et al., 1999). The immunolocalization signals of the translation products are also observed in the gastrolith disc as well as in the gastroliths and the exocuticle in the exoskeleton. The distribution of GAMP immunoreactivity roughly corresponds to that of calcium carbonate (Takagi et al., 2000).
Calcification-associated peptide-1 (CAP-1) and CAP-2, a similar protein to CAP-1, were isolated from the exoskeleton of the crayfish Procambarus clarkii, the same species from which GAMP had been identified (Inoue et al., 2001, 2003, 2004). In CAP-1, one of the serine residues was identified to be phosphorylated (Inoue et al., 2001). Both CAP-1 and CAP-2 have the Rebers-Riddiford consensus sequence, a chitin-binding domain, and actually exhibited chitin-binding ability when examined using the method of Folders et al. (2000). Both proteins also showed inhibitory activity on calcium carbonate precipitation in an in vitro dose-dependent anti-calcification assay, in which the formation of calcium carbonate precipitates was monitored by the turbidity of the solution. The transcripts of the two proteins are both expressed in the epidermal tissues during the postmoult period, where and when the calcification take place. Taking these results together, the authors surmised that CAP-1 and CAP-2 might serve as a connecting peptide between chitin and calcium carbonate in the cuticle and bind to the surface of calcium carbonate crystals so as to function as a nucleator or a regulator of calcification (Inoue et al., 2001, 2003, 2004).
In the prawn Penaeus japonicus, four cDNA species were identified using the differential display technique: those transcripts are expressed in the epidermal cells underlying the exoskeleton specifically during the postmoulting periods. All the four cDNA species, DD4, DD5, DD9A, DD9B, were sequenced to reveal the deduced amino acid sequences (Endo et al., 2000; Watanabe et al., 2000; Ikeya et al., 2001). The amino acid sequence of DD4, which was renamed Crustocalcin (Endo et al., 2004), contained a glutamic acid-rich region and the Rebers-Riddiford consensus sequence, and exhibited sequence similarity to the Drosophila Ca2+-binding protein Calphotin. a2+-overlay analysis indicated clearly that Crustocalcin/DD4 has a Ca2+-binding ability (Endo et al., 2000) and partial segments of Crustocalcin/DD4 induced calcium carbonate crystallization in vitro (Endo et al., 2004). The amino acid sequence of DD5 consists of tandem repeats of a 93–98 amino acid sequence that is repeated 13 times. This repeated motif contains the Rebers-Riddiford consensus sequence, possibly contributing to cross-linking of chitin fibers. DD5 shows a sequence similarity to HaCP18.8, an exoskeletal protein of the American lobster H. americanus (Ikeya et al., 2001). Based on the spatiotemporal expression patterns of the four transcripts and the sequence similarities to crustacean and insect cuticular proteins exhibited by the four proteins, the authors interpreted that all the four proteins, Crustocalcin/DD4, DD5, DD9A and DD9B are the protein components of the exoskeleton, suggesting that they play a role in the calcification of the crustacean exoskeleton (Endo et al., 2000; Watanabe et al., 2000; Ikeya et al., 2001). However the authors also pointed out that two of them, DD9A and DD9B, may not be involved in calcification (Watanabe et al., 2000), because they exhibit sequence similarities with proteins extracted from arthrodial membranes, uncalcified flexible cuticles, of H. americanus (Andersen, 1998) and C. pagurus (Andersen, 1998).
Two genes, CsCP (Callinectes sapidus cuticular protein)8.2 and CsCP8.5, were identified from the EST database of the blue crab Callinectes sapidus, derived from the mRNA differentially expressed in the hypodermis. The two proteins encoded by these genes, containing partial Rebers-Riddiford consensus sequences, may be homologous to CAP-1, CAP-2, DD5 (Wynn and Shafer, 2005) and possibly also to HaCP18.8.
Mollusca
Most molluscan species have shells, one of the most dominant biominerals after corals in the metazoans. Molluscan shells consist of aragonite or a composite of aragonite and calcite, two of the polymorphs of calcium carbonate. The shells have been thought of as remarkable examples of a matrix-mediated biomineralization (Watabe and Wilbur, 1960; Lowenstam, 1981; Lowenstam and Weiner, 1989; Belcher et al., 1996; Falini et al., 1996), and unusually acidic molecules in the shell matrix have been predicted to play important roles in molluscan biomineralization (Hare, 1963; Weiner and Hood, 1975; Mitterer, 1978; Weiner, 1979 Weiner, 1983; Runnegar, 1984; Weiner and Addadi, 1991; Wheeler, 1992; Albeck et al., 1993; Gotliv et al., 2003).
Chronologically speaking, the fifth protein identified from molluscan shells, MSP-1, is the first unusually acidic protein to have been sequenced (Sarashina and Endo, 1998, 2001). MSP-1 is a soluble shell matrix glycoprotein extracted from the calcitic foliated layer of the scallop Patinopecten yessoensis. The protein is rich in serine, glycine and aspartic acid. MSP-1 has a characteristic modular structure containing a basic domain close to the N-terminus and a large sequence unit repeated four times; each unit comprises three domains of SG, D and K domains. The basic and SG domains of MSP-1 are somewhat similar to the basic and the glycine- and serine-rich domains in Lustrin A (Shen et al., 1997) as described below. The K domains in MSP-1 have putative basic cleavage motifs (Barr, 1991). The D domain, rich in aspartic acid and containing Asp-Gly-Ser-Asp and Asp-Ser-Asp motifs, is supposed to represent Ca2+-binding sites. Borbas et al. (1991) reported that phosphate groups were found in organic matrix in calcitic foliated shell structures in higher amounts than in other structures. MSP-1 contain by far the largest number of putative phosphorylation sites (247 sites) of all the skeletal matrix proteins of metazoans, suggesting that MSP-1 is heavily phosphorylated and more acidic than theoretically predicted from the deduced amino acid sequence alone (pI = 3.2). Because many potential phosphorylation sites and N-linked glycosylation sites are found in the D domains, these post-translational modifications are probably involved in the binding of calcium ions. MSP-1 shows sequence similarity to phosphophoryn, and the SG domain of MSP-1 shows similarity to the glycine- and serine-rich domains in Lustrin A (Shen et al., 1997). Both the similarities, however, may result from the biased amino acid compositions. Thus the evolutionary relationships among these proteins remain obscure. The gene encoding a second unusually acidic shell matrix protein, Aspein, was identified from cDNA constructed from the mantle of P. fucata (Tsukamoto et al., 2004). Aspein is more acidic than MSP-1, containing a high proportion of Asp (60.4%) and the predicted isoelectric point is 1.45; this is probably the most acidic of all known proteins. Aspein has a relatively neutral N-terminal region followed by an unusually acidic region containing 58 polyaspartate blocks ranging from 2 to 10 residues in size punctuated by Ser-Gly dipeptides. The Ser in the dipeptides can be phosphorylated, but any putative Nor O-linked glycosylation sites were not found. The transcript of Aspein is expressed only in the edge of the mantle corresponding to the prismatic layer (Takeuchi and Endo, 2005). Seven cDNA clones encoding the members of an unusually acidic shell matrix protein family, Asprich, were identified from a cDNA library constructed from the mantle of the bivalve Atrina rigida (Gotliv et al., 2005). Because the amino acid sequences of the signal peptides of Aspein and Asprich are highly similar, the two proteins are obviously homologous. Asprich has a short basic domain in the N-terminal region followed by the unusually acidic region. The unusually acidic region contains the putative calcium-binding domain, D-D-D-S-E-D-D-D-D-D-D-D-D, forming a specific pocket motif identified by alignment to Calsequestrin, a Ca2+-binding protein in skeletal and cardiac muscles of vertebrates (Yano and Zarain-Herzberg, 1994). Asprich also contains four D-E-A-D motifs, which are thought to bind ATP under the presence of Mg2+ in DEAD box RNA helicase (Pause and Sonenberg, 1992), suggesting that Asprich are capable of binding ATP through Mg2+. Antibodies raised against a synthetic peptide of a part of an acidic region of Asprich reacted with the matrix components of the calcitic prismatic layer, but not with those of the aragonitic nacreous layer. Interestingly, all the three unusually acidic proteins, MSP-1, Aspein and Asprich, are present only in the calcitic shell layers, not in the aragonitic shell layer (Gotliv et al., 2005).
Nacrein is the first molluscan shell matrix protein of which the complete primary structure was revealed. Nacrein is an EDTA-soluble protein extracted from the nacreous layer of the pearl oyster Pinctada fucata (Miyamoto et al., 1996). Nacrein has two potential N-linked glycosylation sites but the electrophoretic mobility of Nacrein did not change when treated by endopeptidase H and F. The protein was stained blue by Stains-all staining (Campbell et al., 1983), suggesting that the protein has a cation-binding ability. The protein exhibits a total of 26 tandem repeats of a G-X-N motif, where X is frequently D or N, a region suspected to be involved in calcium binding. The amino acid sequence of Nacrein shows the highest similarity to human carbonic anhydrase II, and Nacrein itself exhibits a carbonic anhydrase activity when examined using the CO2-Veronal indicator method (Yang et al., 1985). Two homologs of Nacrein, N66 (Kono et al., 2000) and the Turbo marmoratus Nacrein (Miyamoto et al., 2003), were identified by extraction from the nacreous layer of Pinctada maxima, a closely related species to P. fucata, and by screening the cDNA library constructed from the mantle of the gastropod T. marmoratus, respectively. The transcript of N66 is expressed in both the dorsal region and the edge of the mantle, suggesting that the protein is present in both the nacreous and the prismatic layer (Kono et al., 2000). Analysis by quantitative polymerase chain reaction also indicated that Nacrein transcript is present in both the dorsal and edge regions of the mantle in P. fucata (Takeuchi and Endo, 2005). In comparison with Nacrein, N66 has a longer repeat domain containing both G-X-N and G-N repeats. The Nacrein of T. marmoratus also has a longer repeat domain than that of P. fucata, but the repeat domain of the former is made up only of G-N repeats. These two repeat domains are far less acidic than that of P. fucata Nacrein, therefore they seem unlikely to have the calcium-binding ability. On the other hand, the carbonic anhydrase domain of the two homologs are conserved as in Nacrein of P. fucata.
Five insoluble, possibly “framework” proteins that provide a structure in which mineralization takes place, MSI60, MSI31 (Sudo et al., 1997), MSI7 (Zhang et al., 2003), N16/Pearlin (Samata et al., 1999; Miyashita et al., 2000), and Prismalin-14 (Suzuki et al., 2004) have been identified from the pearl oyster P. fucata, and a homolog of N16/Pearlin was also identified from another pearl oyster P. maxima (Kono et al., 2000). MSI60 was extracted from the nacreous layer, and the transcript of MSI60 was revealed to be expressed in the dorsal region of the mantle corresponding to the nacreous layer. MSI60 contains 11 polyalanine blocks of 9–13 residues conferring to the protein a similarity to spider-silk fibroins, and 39 polyglycine blocks of 3–15 residues possibly contribute to form β-sheet conformation (Guerette et al., 1996). Polyaspartate blocks in both the N- and C-termini may bind calcium ions. The transcripts of MSI31 and MSI7 were isolated from cDNA library constructed from the mantle of P. fucata. MSI31 contains ten polyglycine blocks of 3–5 residues in the N-terminal half of the molecule. The C-terminal half is an acidic region including an X-S-E-E-D-Y motif tandemly repeated six times, where X is D or E, and Y is T or M, suggesting that this region binds calcium ions (Sudo et al., 1997). MSI7 shows a high sequence similarity to the N-terminal half of MSI31 and might be a truncated variant of MSI31. An in vitro experiment using the method of Weiss et al. (2000), monitoring the pHvalue dropping of CaCO3-saturated solution containing the test material, suggests that MSI7 accelerates the precipitation of calcium carbonate. Interestingly, the transcript of MSI7 is expressed in both the dorsal region, which forms the nacreous layer, and the edge, which forms the prismatic layer, of the mantle (Zhang et al., 2003), whereas the transcript of MSI31 is expressed only in the edge of the mantle (Sudo et al., 1997; Takeuchi and Endo, 2005). Taking account of the fact that the acidic X-S-E-E-D-Y repeated motif is present only in MSI31, while the polyglycine blocks are present in both MSI31 and MSI7, the acidic motif could be important to formation of the calcitic prismatic layer.
N16/Pearlin (Samata et al., 1999; Miyashita et al., 2000) was extracted from the nacreous layer of P. fucata and the gene encoding N14 (Kono et al., 2000) was identified from the cDNA library constructed from the dorsal region of the mantle of P. maxima. Their amino acid sequences, rich in Gly, Tyr and Asn, have four acidic regions, N-G repeats perhaps forming the glycine loops, Cys residues in the first two-thirds of the molecule and putative phosphorylation sites (Samata et al., 1999; Kono et al., 2000). Although N16/Pearlin does not have a putative N-linked glycosylation site, periodic acid/Schiff (PAS) staining indicated that N16/Pearlin is glycosylated, suggesting that N16/Pearlin is an O-linked glycosylated protein. Ca2+ -overlay analysis indicated that N16/Pearlin has a significant Ca2+ -binding ability (Matsushiro et al., 2003). RT-PCR revealed that the transcripts of N16/Pearlin and N14 are expressed only in the dorsal region of the mantle. The crystallization experiment was carried out using CaCO3-saturated solution containing Mg2+and four types of test material, N16/Pearlin, N14, N66 and the mixture of N14 and N66; it indicated that, in all cases, the protein inhibited calcium carbonate precipitation in solution, but induced platy aragonitic tablets when adsorbed on insoluble matrix membranes (Samata et al., 1999; Kono et al., 2000).
Prismalin-14 was extracted from the prismatic layer, and the transcript was revealed to be expressed in the edge of the mantle corresponding to the prismatic layer. Prismalin-14 exhibited Ca2+-binding ability in the Ca2+-overlay analysis, and also showed inhibitory activity on calcium carbonate precipitation in an in vitro dose-dependent anti-calcification assay (Inoue et al., 2001). Thus acidic residues located in both the N- and C-termini as seen in MSI31 may bind calcium ions or the surface of calcium carbonate crystals. Glycine- and tyrosine-rich region in the C-terminal half of the molecule shows sequence similarity to keratin and may form the glycine loop which confers flexibility to the molecule (Steinert et al., 1984). A secondary structure prediction suggested that the region of P-I-Y-R repeats located in the N-terminus half forms β-strand conformation. Prismalin-14 is similar to MSI31 in that they are both water-insoluble, acidic and possibly responsible for the formation of the prismatic layer of P. fucata. Prismalin-14 shows no sequence similarity to MSI31 (Suzuki et al., 2004).
Four water-soluble matrix proteins have been extracted from the nacreous layer of abalones. Two of them, Perlucin and Perlustrin, identified from Haliotis laevigata, are neutral proteins exhibiting remarkable sequence similarities to vertebrate extracellular proteins. The other two, AP7 and AP24, identified from the red abalone H. rufescens, are moderately acidic proteins exhibiting no significant sequence similarities to known proteins. Perlucin (Weiss et al., 2000; Mann et al., 2000) is an N-linked glycosylated protein and has the C-type (Ca2+-dependent) carbohydrate-binding domain of the C-type lectins, followed by a C-terminal domain containing a motif of ten amino acid residues repeated twice. This C-terminal motif exhibits a significant sequence similarity to a repeated motif of P32 adhesin (Mann et al., 2000), a protein of Mycoplasma genitalium, which is involved in cytadherence and subsequent development of disease pathology (Reddy et al., 1995). Solid phase assays (Gabius et al., 1989) showed that Perlucin has a divalent metal ion-dependent ability to bind to glycoproteins containing galactose or mannose/glucose, indicating that Perlucin is a functional C-type lectin (Mann et al., 2000). Perlucin is different from other C-type lectin-like domain proteins isolated so far from other biominerals, because these proteins are not thought to have a carbohydrate-binding ability as described below. Perlucin promotes calcium carbonate precipitation (Weiss et al., 2000; Blank et al., 2003) as shown by the in vitro precipitation assay (Wheeler et al., 1981). The second protein, Perlustrin (Weiss et al., 2000; 2001), showed significant sequence similarity to the N-terminal domain of mammalian insulin-like growth factor binding proteins (IGFBPs). Surface plasmon resonance assays indicated that Perlustrin has notable binding affinity for human IGFs. Several IGFBPs are known to participate in bone metabolism in vertebrates, and IGFBP-5, the major bone IGFBP, is thought to bind to hydroxyapatite (Campbell and Andress, 1997). This is reminiscent of the fact that the water-soluble component of nacre stimulates bone formation (Westbroek and Marin, 1998; Lamghari et al., 1999; Almeida et al., 2000; Mouriès et al., 2002; Rousseau et al., 2003; Milet et al., 2004), and may support the notion that phylogenetically distant biomineralization systems, nacre and bone, contain components inherited from common ancestors (Weiss et al., 2001).
The other two abalone water-soluble proteins, AP7 and AP24, might form a dimer through hydrophobic interactions, because the ethanol eluant containing them could not be resolved into the two proteins by hydrophobic interaction chromatography, although they could be resolved by SDS PAGE (Michenfelder et al., 2003). Synthetic N-termini of 30 amino acid residues of AP7 and AP24, which contain putative Ca2+-binding sites, D-D and D-D-D-E-D, respectively, formed random coil-like structures confirmed by NMR spectroscopy (Wustman et al., 2004). The crystal growth assay with calcite crystals and synthetic N-termini of AP7 and AP24 in CaCO3-saturated solutions indicated that both AP7 and AP24 inhibited calcite growth, Michenfelder et al. (2003) thus inferred that these proteins contribute to the growth of aragonite. Electrophoretic mobility of AP24 changed when treated with N-glycosidase F, indicating that AP24 has N-linked carbohydrates (Michenfelder et al., 2003).
Lustrin A is an insoluble, possibly a framework protein, identified from the nacreous layer of the abalone Haliotis rufescens. This protein is 1409 amino acid residues long excluding signal peptide (19 amino acids), which is the longest invertebrate skeletal matrix protein completely sequenced so far, and has a complex modular structure comparable to that of MSP-1 (Sarashina and Endo, 1998, 2001). Lustrin A is rich in serine, proline and glycine, and contains ten cystein-rich domains. Nine of them are interspersed by eight proline-rich domains and a glycine and serine-rich domain lies between the two cystein-rich domains nearest to the C-terminus. They are followed by a basic domain and the C-terminal domain, and the latter is similar to protease inhibitors. The high sequence similarities among cystein-rich domains suggest that they undergo similar folding and perhaps contribute to protein-protein interactions. The density of proline residues of proline-rich domains are predicted to adopt an extended rod-like structure (Williamson, 1994), and the domains thus possibly contribute to separate the cystein-rich domains so that they can fold independently. The G-S and G-S-S-S repeats in the glycine-and serine-rich domain may form rubber-like glycine loops (Steinert et al., 1991) and the domain may have elastic property. The basic and C-terminus domains may play a role in interacting with anionic molecules and protecting secreted matrix proteins.
Three completely sequenced shell matrix proteins have been identified from molluscan species other than pearl oysters and abalones. One is MSP-1 as described above, and the other two are Mucoperlin and Dermatopontin. Mucoperlin is a water-soluble acidic protein identified from a cDNA expression library constructed from the mantle tissues of the Mediterranean fan mussel Pinna nobilis (Marin et al., 2000). Mucoperlin contains a tandem repeat region consisting of 13 repeated units of 31 amino acid residues rich in Ser and Pro, suggesting that this region has a rigid rod-like conformation (Williamson, 1994). Four putative N-linked glycosylation sites and 27 putative O-linked glycosylation sites are recognized in this region, suggesting that Mucoperlin may be heavily glycosylated. The acidic C-terminal region of Mucoperlin contains densely distributed putative phosphorylation sites and two putative sulfated Tyr sites (Huttner, 1987). Mucoperlin exhibits a certain sequence similarity to PGM, a pig gastric mucin. Mucoperlin also has structural features characteristic of mucins, such as Pro/Serrich repeats and putative O-linked serine residues. The immunodetection of Mucoperlin performed using ELISA and dot-blot analysis with the antiserum directed against recombinant non-glycosylated Mucoperlin indicated that Mucoperlin is located in the nacreous layer but not in the prismatic layer. In situ immunohistochemical localization assay using an antibody elicited against the recombinant Mucoperlin indicated that Mucoperlin is localized around polygonal nacre tablets. Mucoperlin may control the lateral extension of the tablet (Marin et al., 2000; Marin and Luquet, 2004).
Dermatopontin is a water-soluble neutral protein extracted from the aragonitic cross-lamellar layer of the freshwater snail Biomphalaria glabrata (Marxen and Becker, 1997; Marxen et al., 2003). Dermatopontin was revealed to have a pI of 7.4 by isoelectric focusing (IEF). Alcyan blue and Stains-all staining indicated that Dermatopontin is an acidic mucopolysaccharide and has no Ca2+-binding ability (Campbell et al., 1983; Butler et al., 1981). Dermatopontin contains a single N-glycosylation site, and that Asn in this site is modified by a short N-glycan was confirmed using tandem mass spectrometry (MS/MS). Dermatopontin is also called TRAMP (Tyrosine-Rich Acidic Matrix Protein), which is a component of extracellular matrix in vertebrates (Neame et al., 1989; Superti-Furga et al., 1993; Forbes et al., 1994) and invertebrates (Fujii et al., 1992; Schütze et al., 2001). Derma-topontin shows a widespread tissue distribution in mammals, including skin, skeletal muscle, heart, lung, kidney, cartilage and bone with possible functions in cell-matrix interactions and matrix assembly (Forbes et al., 1994). In mammals, they bind to decorin, and the Dermatopontin-decorin complex accelerates collagen fibrillogenesis (MacBeath et al., 1993) and modifies the behaviour of TGF-β (Okamoto et al., 1996, 1999). No collagen has been identified in the shell matrix of B. glabrata (Marxen and Becker, 1997), but the shell matrix contains a considerable amount of proteoglycans (Marxen et al., 1998). Thus, Dermatopontin may interact with shell proteoglycans to organize an organic scaffold for mineralization (Marxen et al., 2003). Dermatopontin is inferred to have been coopted for molluscan calcification after the divergence of molluscs and other metazoan phyla, possibly in much more recent times than the “Cambrian explosion” based on the data obtained from molecular phylogenetic analysis (Sarashina et al., 2005).
Echinodermata
The mineral component of the spicules of sea urchins is magnesian calcite, the ratio of Mg2+/Ca2+ being approximately 1/19 (Okazaki and Inoué, 1976). Amorphous calcium carbonate transforms into calcite during spicule growth (Beniash et al., 1997). The spicule is likely to be composed of a single crystal based on the results of X-ray diffraction analyses (Beniash et al., 1999). Some researchers, however, have argued that sea urchin spicules are built from a polycrystalline aggregate based on the results of X-ray diffraction analyses, chemical analyses and scanning electron microscopical observation (Okazaki and Inoué, 1976), and this possibility cannot be excluded because X-ray diffraction is not able to distinguish the assembly from a single crystal in some cases (Robach et al., 2005). The larval spicule also contains organic matrix of about 0.1–1% by weight (Seto et al., 2004; Okazaki and Inoué, 1976), which includes approximately four dozen matrix proteins at least (Killian and Wilt, 1996). In the case of Strongylocentrotus purpuratus, the complete amino acid sequences of eight spicule matrix proteins have been determined so far (Wilt, 1999, 2002; Wilt et al., 2003; Wilt, 2005).
Two genes encoding two spicule matrix proteins, SM50 (Benson et al., 1987; Sucov et al., 1987; Katoh-Fukui et al., 1991) and SM30, or pNG7 SM30 (George et al., 1991), have been identified from cDNA expression libraries constructed from the whole embryo of the purple sea urchin Strongylocentrotus purpuratus using antiserum directed against the total spicule matrix proteins. Within the embryo, both the SM50 and SM30 transcripts are expressed exclusively in the primary mesenchyme cells (PMCs). SM50 is a basic protein with a molecular mass of approximately 50 kDa, and has no putative N-linked glycosylation sites. The protein has a Pro-rich region and a repeated motif of 13 amino acid residues, Q-P-G-F/M/W-G-N/G-Q-P-G-V/M-G-G-R/Q. The N-terminal half of SM50 shows sequence similarity to C-type lectin. In the upstream region of the SM50 gene there exists an ets binding site that functions as a positive cis-regulatory element (Kurokawa et al., 1999). SM50 is expressed in both embryonic spicules and adult spines (Richardson et al., 1989). At the late gastrula stage, the expression is stronger in the ventrolateral clusters than dorsal chain in the PMC syncytium (Lee et al., 1999). In situ immunohistochemical localization assay revealed that SM50 is localized around the spicule surface, on which spicule deposition occurs (Urry et al., 2000). LSM34, a homolog of SM50, was identified from the cDNA library constructed from mesenchymal and endoderm cell fractions of the painted sea urchin Lytechinus pictus using S. purpuratus cDNA encoding the spicule matrix protein as the probe (Livingston et al., 1991). This protein has a repeated motif of 7 amino acid residues, G-G-Q/R-Q-P-G-F, slightly different from that of SM50. Both repeated motifs of SM50 and LSM34 show sequence similarities to the repeated motifs of elastin and wheat gluten. The motifs have been considered to form a β-spiral supersecondary structure and confer elastic properties to the molecules (Venkatachalam and Urry, 1981; Miles et al., 1991). Peled-Kamar et al. (2002) showed that inhibition of LSM34 expression resulted in a block of spicule elongation. They injected antisense oligonucleotides directed against LSM34 transcript to the blastocoel of L. pictus embryos at the hatching blastula stage. Although small, possibly calcitic granules were recognized in the PMCs of antisense-treated embryos, those embryos had no apparent spicules. These results suggest that LSM34 is essential for spicule elongation, even if initial nucleation does not require LSM34. HSM41, a second homolog of SM50, was identified from the cDNA library constructed from the whole embryo of the Japanese sea urchin Hemicentrotus pulcherrimus screened with a cDNA clone of SM50 (Katoh-Fukui et al., 1992). HSM41 has a repeated motif of 13 amino acid residues, Q-P-G-F-G-N-Q-P-G-M/V-G-G-R/Q/N. This repeated motif is more similar to that of SM50 than to that of LSM34, obviously reflecting the evolutionary relationship of the three species.
SM30, or pNG7 SM30, is an acidic matrix protein with a molecular mass of approximately 30 kDa (George et al., 1991) and possibly the most abundant spicule matrix protein judging from the signal intensities of 35S-labeled proteins separated by two-dimensional gel electrophoresis (Killian and Wilt, 1996). Southern blot analysis indicated that there are two to four copies of SM30 genes in the S. purpuratus haploid genome (Akasaka et al., 1994), while the SM50 gene occurs once per haploid genome (Sucov et al., 1987). Akasaka et al. (1994) isolated a genomic clone and found that at least two SM30 genes are tandemly arranged, designating them SM30-α and SM30-β. The former is completely and the latter is partially sequenced. The SM30 protein encoded by the pNG7 clone previously reported by George et al. (1991), or pNG7 SM30, shows a high sequence similarity to the SM30-α protein. Most of the amino acid differences between pNG7 SM30 and SM30-α, however, occur at the probably important sites conserved between SM30-α and other C-type lectins, suggesting that they may be not alleles but different forms of the SM30 proteins (Killian and Wilt, 1996). On the other hand, SDS-PAGE separations revealed that embryonic spicules contain SM30-A of 43 kDa and SM30-B of 46 kDa, and adult spines contain SM30-B and SM30-C of 49 kDa. Although SM30-α probably encodes SM30-A based on the expression patterns of the SM30-α transcript, we cannot be sure which SM30 gene encodes SM30-B or -C (Killian and Wilt, 1996). The observations on the electrophoretic mobilities of SM30-A, -B and -C treated with endoglycosydase F indicated that they are all N-linked glycosylated, harmonizing with the fact that each deduced amino acid sequence of pNG7 SM30 and SM30-α contains a putative N-linked glycosylation site (George et al., 1991; Akasaka et al., 1994). Both pNG7 SM30 and SM30-α proteins are rich in Pro and Ala, but do not have a conspicuous Pro-rich domain as in SM50. SM30-α exhibits high sequence similarity to the carbohydrate recognition domain (CRD) of C-type lectins, but lacks a pair of cysteine residues. The Ca2+-binding sites, which are conserved in the CRDs of other C-type lectines, are not conserved in SM30-a (Killian and Wilt, 1996). In situ immunohistochemical localization assay revealed that SM30 is localized within the spicule (Urry et al., 2000).
Five cDNAs, abundantly expressed in PMCs at the time of PMC ingression, were identified from the cDNA library constructed from differentiated micromere cultures of S. purpuratus (Harkey et al., 1988). The complete primary structure of one of them, designated PM27, was determined (Harkey et al., 1995). PM27 is possibly a nonglycosylated protein. PM27 has no potential N-linked glycosylation sites and, although it has potential O-linked glycosylation sites, they have very weak potential. The N-terminal domain consists of eight tandem repeats of P-G-M-G and five tandem repeats of Q-G, showing sequence similarity to several vertebrate fiber-forming proteins such as keratin, collagen, elastin and wheat glutinin. This N-terminal repeat domain is followed by the domain exhibiting sequence similarity to C-type lectin. The immunolocalization with the antibodies raised against recombinant PM27 indicated that PM27 accumulated at the growing surface of the spicule tips and disappeared from the mature mid-shaft region of the spicules. These results suggest that PM27 is involved in skeletal growth rather than maintenance or structural integrity of the spicules. In Western blot analyses, both antisera directed against the two different domains of PM27 detected two proteins in larvae and three proteins in adult tests. Since one is probably common to both larval and adult tests, at least four proteins sharing a similar structure are expressed during skeletogenesis (Harkey et al., 1995). The results of Southern and Northern blot analyses, however, are consistent with the conclusion that PM27 is a single-copy gene (Harkey et al., 1988, 1995). Thus, the most plausible source of the multiple proteins is the post-translational modifications. The SM37 gene was found in the genomic DNA fragment that contains the SM50 gene. The two genes are linked at a distance of about 12 kb (Lee et al., 1999). The SM37 gene included the same or highly similar cis-regulatory elements to that of the SM50 gene (Makabe et al., 1995; Lee et al., 1999). At the late gastrula stage, the expression of SM37 is stronger in the ventrolateral clusters than in the dorsal chain in the PMC syncytium. The results of both RT-PCR using the skeletogenic mesenchyme cell culture and the whole-mount in situ hybridization indicated that, in the embryos, the transcript of SM37 is expressed exclusively in skeletogenic mesenchyme cells, exactly as in SM50. The probe excess titration measurements (Killian and Wilt, 1989), which allow determination of the absolute number of transcripts, revealed that the transcripts of SM37 and SM50 are quantitatively almost identical through embryonic development (Lee et al., 1999), and these temporal schedules of SM37 and SM50 expression are different from SM30 (Guss et al., 1997). These results suggest that the two genes encoding SM37 and SM50 are regulated coordinately, and SM37 is very probably also a spicule matrix protein like SM50. SM37 has three putative N-linked glycosylation sites. Although the overall structure of SM37 is similar to SM50, the repeated motif of SM37, the consensus form of which is G-A/G-G-A/G-G-G-A-G-A-G-G-R-W-N-P-N-Q, is fairly different from SM50 except for the scaffold of Gly, Gln and Pro (Lee et al., 1999).
Zhu et al. (2001) carried out a large-scale analysis of mRNAs expressed in PMCs at the gastrula stage of S. purpuratus. Two species of mRNAs were identified to be PMC-specific using whole-mount in situ hybridization. Thereafter they reported the complete amino acid sequences of two putative spicule matrix proteins encoded by these two mRNAs, SpSM32 (originally named SM50-related protein) and SpC-lectin, and another new putative spicule matrix protein SpSM29, the transcript of which is also expressed only in the PMCs (Illies et al., 2002).
SpSM29 and SpSM32 are basic proteins just like SM50, PM27 and SM37. SpSM29 has a Pro-rich repeat domain in the N-terminal side and a C-type lectin domain in the C-terminal side, similar to PM27, while SpSM32 has a C-type lectin domain in the N-terminal side and a Pro-rich repeat domain in the C-terminal side, similar to SM50 and SM37. The repeated motif of SpSM32 is P-X-Y, where X is usually Asn and Y is usually Gln. A region of approximately 200 nucleotides at the ′ end of SpSM32 mRNA, corresponding to the 5′ untranslated region and the first 35 amino acid residues of SpSM32, is identical to the sequence of SM50 at the nucleotide level. One possibility is that SpSM32 and SM50 mRNAs share a common exon as the result of a cis-splicing. A single primary transcript including open reading frames of both SpSM32 and SM50 may undergo alternative cis-splicing. The SM50 gene is known to be closely linked to the SpSM37 gene as described above, so it is possible that the SpSM32 gene is also a member of this complex (Illies et al., 2002).
SpC-lectin is acidic and has a C-type lectin domain but does not have a Pro-rich repeat domain, just like SM30 proteins. Although SpC-lectin is possibly a spicule matrix protein, it is not yet clear whether the SpC-lectin proteins interact directly with the mineral phase of the spicule (Illies et al., 2002).
Prominent features of skeletal matrix proteins of invertebrates
Acidic nature
The complete amino acid sequences of 77 invertebrate skeletal matrix proteins have been published to date (Table 1, Figure 1). The information of the primary structure revealed some prominent features characteristic to skeletal matrix proteins. One of the most conspicuous is their acidic nature.
Table 1.
List of matrix proteins in invertebrates referred to in this article. For all the proteins listed here, the complete primary structure has been determined. The source organisms, references and the DNA databank accession numbers are also listed. Numbers in parentheses are the accession numbers in the MIPS-Protein Sequence Database (Martinsried, Germany).
![i1342-8144-10-4-311-t101.gif](ContentImages/Journals/jpal/10/4/prpsj.10.311/graphic/WebImages/i1342-8144-10-4-311-t101.gif)
Table 1.
Continued.
![i1342-8144-10-4-311-t102.gif](ContentImages/Journals/jpal/10/4/prpsj.10.311/graphic/WebImages/i1342-8144-10-4-311-t102.gif)
Figure 1.
Amino acid contents by percentage and isoelectric points of invertebrate skeletal matrix proteins the complete primary structure of which has been reported. The black bars on the left side represent percent aspartic acid residues. Dark grey and light grey bars represent glutamic acid residues and putative phosphorylation sites, respectively. Those three kinds of residues contribute to the acidic nature of proteins. Black bars on the right side represent basic residues including arginine, lysine and histidine residues.
![i1342-8144-10-4-311-f101.gif](ContentImages/Journals/jpal/10/4/prpsj.10.311/graphic/WebImages/i1342-8144-10-4-311-f101.gif)
Aspartic acid (Asp) has long been known as a major component of skeletal matrix proteins, and has been regarded as important in biomineralization processes by virtue of its nature as an acid to interact with Ca2+ (Hare, 1963; Weiner and Hood, 1975; Mitterer, 1978; Weiner, 1979, 1983; Runnegar, 1984; Weiner and Addadi, 1991; Wheeler, 1992, Albeck et al., 1993; Gotliv et al., 2003). There were, however, some suspicions that the amino acids detected as aspartic acid in the amino acid composition analysis might in fact be present as their amide form, asparagine, in the matrix proteins (Young, 1971; Crenshaw, 1972, 1982). But characterization of the primary structure of the scallop shell protein MSP-1 (Sarashina and Endo, 1998, 2001) clearly demonstrated that there indeed exists a shell matrix protein rich in aspartate (Asp) residues (20 mol%). Based on the MSP-1 sequence, the gene sequence for even more Asp-rich shell matrix proteins Aspein (60%) and Asprich (45%) were isolated (Tsukamoto et al., 2004; Gotliv et al., 2005). In fact, these three shell matrix proteins have the lowest values of isoelectric point (MSP-1, 3.2; Aspein, 1.5; Asprich, 2.5) among the 77 sequenced invertebrate skeletal matrix proteins. Among these 77 invertebrate skeletal matrix proteins, 43 are acidic (pI < 6.0), 20 neutral and 14 basic (pI > 8.0), suggesting that acidic skeletal matrix proteins are more common than basic ones. Only in echinoderms is the number of known basic skeletal matrix proteins (5) more than that of acidic ones (2). This fact is likely to reflect sampling bias rather than the whole nature of the spicule matrix proteins, because it is technically difficult to isolate unusually acidic proteins as a single protein (Wheeler et al., 1981; Gotliv et al., 2003). Two-dimensional gel electrophoreses of water-soluble spicule matrix proteins of the sea urchin S. purpuratus revealed that the majority of them are acidic (Killian and Wilt, 1996), supporting the interpretation that the basic nature of sequenced echinoderm spicule matrix proteins reflects the technical bias.
The acidic nature of skeletal matrix protein is mainly due to high content of acidic amino acid residues, Asp and Glu, although phosphorylation, sulfation and glycosylation may also be important. In 77 sequenced invertebrate skeletal matrix proteins, unusually acidic amino acid-rich (> 20%) proteins are Selenoprotein M (Porifera), CaCP5.75, CsCP8.2, CsCP8.5, Orchestin, Crustocalcin/DD4, DD9A, CAP-1, CAP-2 (Arthropoda), MSP-1, Aspein and Asprich (Mollusca). Among these, unusually Asprich (> 15%) proteins are CsCP8.2, CsCP8.5, Orchestin, CAP-1 (Arthropoda), MSP-1, Aspein and Asprich (Mollusca), and unusually Glurich (> 15%) proteins are Selenoprotein M (Porifera), Crustocalcin/DD4 and CAP-2 (Arthropoda). Moreover, Crustocalcin/DD4 contains a conspicuous Glurich region, while MSP-1, Aspein and Asprich have conspicuous Asprich regions. These facts suggest that, in arthropods, not only Asp but Glu is important for providing negative charges to skeletal matrix proteins.
Repeated structures
Repeated sequences are commonly found among skeletal matrix proteins. In fact, they are found in skeletal matrix proteins in all the five invertebrate phyla from which the sequence data have been obtained. Most repeated sequences are rather short, ten or less amino acids in length, such as D-S or S-D (Orchestin), G-N or N-G (N66, TmNacrein, N16/Pearlin), G-S or S-G (MSP-1, Lustrin A), Q-G (PM27), S-D-E, S-R-E (Orchestin), D-S-D (MSP-1), G-S-S (Lustrin A), G-X-N (Nacrein, N66), P-N-Q (SpSM32), S-D-E-S (Orchestin), D-G-S-D (MSP-1), P-I-Y-R (Prismalin-14), P-G-M-G (PM27), D-D-S-R-E (Orchestin), G-G-G-G-S (MSI60), E-E-D-M/T-E/S (MSI30), Q-Q-A-A-P-A/T (GAMP), X-S-E-E-D-Y (MSI30), X-A-G-X-X-P-Y (Pb CP-12.7), G-G-Q/R-Q-P-G-F (LSM34), E-S-R/E-E-E-P-R-K-L (Orchestin) and N/D-S-L-HA-N-L-Q-Q-R (Perlucin). Some repeated sequences of skeletal matrix proteins are longer, ranging from 13 to 31 amino acids (a.a.) in length. They are recognized in SM50, HSM41 (13 a.a.), SM37 (17 a.a.), Spicule-associated protein (20 a.a.), Galaxin (27–31 a.a.) and Mucoperlin (31 a.a.). Gly, Ser, Glu, Asp, Gln, Pro and Asn residues occur frequently in these repeated motifs. Functions of the repeats have been proposed as binding to chitin (P-I-Y-R), binding to some macromolecules (repeats in Galaxin), binding to membrane (Q-Q-A-A-P-A/T, repeats in Spicule-associated protein), flexible property (N-G, G-N, S-G, G-S, G-S-S, G-G-G-G-S, G-G-Q/R-Q-P-G-F, N/D-S-L-H-A-N-L-Q-Q-R and repeats in SM50 and HSM41), rod-like property (repeats in Mucoperlin) and Ca2+-binding in many repeated motifs which have acidic properties.
Extremely long repeated motifs are found in three skeletal matrix proteins, Lustrin A (75–88 a.a.), DD5 (93–98 a.a.) and MSP-1 (158–177). Lustrin A and MSP-1 are molluscan shell matrix proteins having a complex modular structure. The long repeated motifs of MSP-1 and Lustrin A contain a Gly- and Serrich domain having an elastic property and a Prorich domain having an extended rod-like structure, respectively. They may contribute to separate the functionally important domains in the repeated motifs so they can fold independently and may bind to Ca2+ (in MSP-1) or other proteins (in Lustrin A). Each of the 13 repeated motifs in DD5 contains the Rebers-Riddiford sequence, a chitin-binding site, but does not contain the connecting domain such as the Gly-and Ser-rich domain in MSP-1 or the Prorich domain in Lustrin A. One DD5 protein molecule may be cross-linked to 13 chitin fibers to form a rigid and compact macromolecular network.
Although various functions for the repeated sequences have been proposed, the commonality of the short repeated structures among different animal phyla appear to suggest strong functional constraints such as interactions with the regular arrangement of ions in crystal lattices. On the contrary, the long repeated unit may contain domains which fold and function independently to the adjacent repeated units. Thus, the long repeats may not necessarily reflect the regular arrangement of ions in crystals.
C-type lectin-like domains
CRDs of C-type lectin are known to mediate sugar-binding under the presence of Ca2+ (Drickamer, 1999). Except for Tetranectin found in the cartilage of the shark, C-type lectin-like domains (CTLD) have been found in calcium carbonate skeletal matrix proteins, such as molluscan Perlucin, all ten echinodermal spicule matrix proteins as well as several vertebrate skeletal matrix proteins, Lithostathine (originally named Pancreatic stone protein; De Caro et al., 1987; Bertrand et al., 1996), Ovocleidin-17 (Hincke et al., 1995; Mann and Siedler, 1999), Ovocleidin-116 (Hinke et al., 1999), Ansocalcin (Lakshminarayanan et al., 2002, 2003), Struthiocalcin-1, Struthiocalcin-2 (Mann and Siedler, 2004), Dromaiocalcin-1, Dromaiocalcin-2, Rheacalcin-1, Rheacalcin-2 (Mann, 2004). Interestingly, CTLDs found in calcium carbonate skeletal matrix proteins in both vertebrates and echinoderms do not bind carbohydrates and do not retain the carbohydrate-binding amino acid residues which are characteristic of lectins. Lithostathine, a vertebrate pancreatic stone protein, is known to contain a putative Ca2+-binding site but it is not located within the CTLD (Bertrand et al., 1996). It is difficult to predict functions of these CTLDs in calcium carbonate skeletal matrix proteins, as CTLDs are known to exhibit diverse functions (Drickamer, 1999). Perlucin in molluscs, in contrast to the deuterostome calcium carbonate skeletal CTLD proteins, has a functional carbohydrate-binding ability, and may be involved in Ca2+-dependent glycoprotein-protein interaction within the skeletal matrix (Mann et al., 2000). Thus, functions of CTLDs found in skeletal matrix proteins are possibly different between deuterostomes and molluscs.
Tetranectin is a member of the family of mammalian C-type lectin, shared by the blood and the extra-cellular matrix. Neame et al. (1992) identified a homolog of Tetranectin from the reef shark cartilage, which is possibly capable of binding sugars. Tetranectin is also thought to play an important role during osteogenesis in mice (Wewer et al., 1994), and its function is highly likely to be different from the calcium carbonate skeletal CTLD proteins.
Glycosylation
Glycosylation is a common modification of skeletal matrix proteins. General functions of protein glycosylation include protein folding and stability, receptor functioning, cell adhesion and signal transduction (Bhatia and Mukhopadhyay, 1999). Cho et al. (1996) proposed an oligosaccharide-mineral binding function based on the observation that oligosaccharide moieties are found in association with calcium carbonates of sea urchin spicules. In many cases, however, the significance of glycosylation in skeletal matrix proteins has not been clarified.
Implication and conclusion
From the morphological viewpoint, skeletons of many metazoan phyla appear to have evolved independently. However, because of the findings that the molluscan shell nacres have the ability to induce osteogenesis in vertebrates, metazoans belonging to distantly related phyla became suspected to have common underlying machineries of biomineralization, at least partly (Westbroek and Marin, 1998). Recently some skeletonization regulatory genes and their networks in invertebrates as well as vertebrates have been clarified. So far, the results seem to support the idea of independent origins of metazoan biomineralization, a premise which is also supported by the complete primary structure of skeletal matrix proteins as summarized above. Of course, currently we have very little information on molecules involved in biomineralization. We could not draw definitive conclusions and further information is awaited.
The repeated sequences are one of the common features of the skeletal matrix proteins. As described above, the amino acid compositions of the repeated sequences are biased and somewhat similar among metazoan phyla. But the amino acid sequences are different from each other. So this commonality of repeated sequences in metazoan skeletal matrix proteins may be a result of convergence. The acidic nature is also a major common feature of the skeletal matrix proteins. This common feature is also considered as convergence because significant sequence similarities are not observed among acidic skeletal proteins of different phyla. For instance, the unusually acidic shell matrix proteins of molluscs contain exclusively Asp as the acidic amino acid, whereas those of arthropods also contain Glu in addition to Asp. C-type lectin-like domains (CTLD) are found in some skeletal matrix proteins in both molluscs and deuterostomes, implying at first sight that this part of the underlying machineries has a common origin. The sites important for carbohydrate-binding, however, are not conserved between molluscs and deuterostomes, and thus the roles of CTLD in biomineralization are likely to be totally different between the two groups. These two groups are likely to have coopted the same gene independently after they diverged. Eight out of the known 37 arthropod skeletal matrix proteins have the Rebers-Riddiford consensus sequence which is found in non-calcified cuticular proteins in arthropods. This observation indicates that these arthropod biomineralization proteins have been coopted from the cuticular proteins after arthropods diverged from other metazoan groups. Dermatopontin, a molluscan shell matrix protein, is also considered to have been coopted for biomineralization after molluscs diverged from other metazoan groups based on the molecular phylogenetic analysis.
The information obtained from regulatory genes involved in biomineralization, their networks and invertebrate skeletal matrix proteins, therefore, seems to support the idea that the machineries involved in metazoan biomineralization have evolved independently after the divergence of metazoan phyla. Although we find some homologous components among metazoan biomineralization machineries, they can be explained by independent cooptions of the common components. At least some metazoan groups coopted for biomineralization some common components which had served some other preexisting functions than biomineralization. This scenario is consistent with the observation that the molluscan shell nacre or matrices have the ability to induce osteogenesis in vertebrates. Marin et al. (1996) detected immunological crossreactivities between skeletal water-soluble matrices and mucous excretions of both a cnidarian and molluscs. This result seems to suggest that calcifying machineries of the two phyla derived, at least partly, from the common noncalcifying precursors. This observation also seems to harmonize with our scenario.
Furthermore, some researchers raise a possibility that metazoan biomineralization systems have been derived from a system that the last common ancestor of metazoans possessed. More specifically, Kirschvink and Hagadorn (2000) hypothesized that the ancestral eukaryotes inherited the ability to precipitate magnetite from magnetotactic bacteria during endosymbiotic events. The magnetite system would have to have been present in most of the metazoan phyla during the “Cambrian explosion”, and could have provided a template for other biomineralization systems to evolve. At least so far, however, we have not found significant evidence to support this hypothesis.