laitimes

Spider genome research reveals the scientific problems of spider silk gene diversity and its complex expression pattern background knowledge assembly results research

<h1>Background</h1>

This paper focuses on the gene diversity and complex expression patterns of spider silk proteins through the Nephila clavipes genome study (Nephila clavipes), with a view to revealing the reasons for the diversity of spider silk through these two aspects.

Spiders have evolved over 380 million years, have formed more than 46,000 species, their diverse spider silk can be used to prey and breed, spider silk is more solid than steel and more flexible than Kevlar, but lighter than both materials, spider silk also has stretchability, conductivity, prevents bacterial growth and the rejection of the human immune system is not obvious, so the new material made of spider silk will bring great potential for innovation in the medical and industrial fields, in order to better use these properties, We must understand the genetic basis, functional diversity and production mechanism of spider silk.

A female spider of the family Round Spider family can produce up to 7 kinds of spider silk glands of different forms, each gland can produce spider silk with different biological characteristics due to its unique gene expression, and the female spider of the family Round Spider family can produce the following seven kinds of spider silk:

1. The main pot belly-like silk has strong anti-tension and is used as a traction rope in the weaving net;

2. Secondary amphora-shaped filament, which is used to form an inelastic temporary spiral during the netting process;

3. The pear-like silk of the cement adhesive class combines with other substrates of the fiber;

4. Grape bunch silk is used to isolate stored food and oocyst; 5, tube mounting and cylindrical silk are used to form a hard outer layer of the egg sac;

6. Whip-like filament has super expansion and is used to catch prey;

7. Sticky filaments are used to prey. Some spiders have only some types of silk, and some produce other types of silk, such as sieved silk. Each spider has its own unique set of glands, producing different silks to suit specific needs.

Spider silk is mainly composed of spider silk protein, spider silk protein has conserved N-terminal and C-terminal domain, both on both sides of the repeated motifs, the composition and quantity of motifs give spider silk specific physical properties, although the round spider silk has been studied for decades, but the spider silk information of the round spider family is still incomplete.

The sequenced velvet spider produced only 19 spider silk proteins and lacked whiplash silk and polymeric silk, limiting the diversity of silk protein sequences within the same species, on the contrary, female arachnid spiders used all seven silk glands, and the first identified spider silk protein was from N. Clavipes identified the species as later used to study silk genes, structural diversity, and the evolutionary history of gene families, but the genome has never been deciphered.

<h1>Assembly results</h1>

N. clavipes genome size is estimated at 3.45Gb, 55% repeat sequence ratio, assembled 2.44Gb genome (contig N50 = 8.1kb, scaffold N50 = 62.9KB), sequenced by the illumina sequencing platform, overall 98.5X coverage, assembly level is too general.

Spider genome research reveals the scientific problems of spider silk gene diversity and its complex expression pattern background knowledge assembly results research

<h1>Study scientific problems</h1>

1. Golden spider total silk protein summary

Obtain N. by comparing genome, transcriptome, and genetic models. Clavipes's 28 candidate filament protein genes, through fragment PCR and three-generation sequencing to reconstruct and determine the location of filament protein genes, 27 of the 28 filament protein genes encoding the C-terminal and encoding the N-terminal sequences are located on the same scaffold, and 20 silk protein full-length sequences were obtained at the same time, but the partial sequences of the three filamentin genes were found, indicating that there are other filamentin genes to be identified.

In order to map the detected 28 candidate silk proteins to the 7 major silk proteins, we compared the conservative C-terminus and N-terminal and motif to find that these silk proteins belong to one of the seven major classes, but there are also some new members in some categories (minor ampullate, flagelliform, and aggregate), and at the same time in N. The discovery of seven new filament proteins in clavipes that do not belong to any of the 7 broad categories indicates that this classification has yet to be perfected.

N. clavipesd's 20 full filament proteins vary widely in the length of amino acids encoded, ranging from 407 to 5939, and this study found new --- MiSP-c and MiSp-d of MiSp filamentins through large pcrations, which are longer than before; this assembly shows that multiple exon variable cuts were found in the whip-like type of filament protein genes, which is very consistent with previous studies. In addition, the silk protein of N. clavipes is rich in glutamic acid, alanine, and serine residues, which is particularly unusual compared to other genes on the genome.

In order to include N. Clavipes' filament protein repeat sequences, by calculating and labeling motif, observed 394 unique motif variants, in addition to the previously reported motifs, found hundreds of new motif variants and some new variants of motif that have been previously classified, 20 full filament protein genes in the repeated motif structure spans 50%-96% of the coding region, the complexity and diversity of these repeat structures are greater than previously reported. In order to better study this complex diversity, the unique motif structure is divided into 49 classifications through homology, a taxon consisting of GXGGX, GPGGY and polyalanine-related motifs variants and a new similar polyalanine motifs variants are clustered together, the new DTXSYXTGEY motif is the most frequent motif, accumulated 554 times, and other motifs with higher frequency include: (i) GPGTTPGTI, (ii)multi TTX, (iii) multi GL, (iv) multi SQ/XQQ (v) non-alaninehomopolymer runs。 MaSp-g contains 73 unique motif variants in 20 motifs groups, the N. AgSp-d has the longest repeat motifs, masp-f, MaSp-g and MiSp-c have the largest differences, with 20 of the 49 motifs.

Spider genome research reveals the scientific problems of spider silk gene diversity and its complex expression pattern background knowledge assembly results research

In N. Clavipesd can find 46 of the 49 motif variants in the filament protein gene, and even more wonderfully, between the filament protein gene groups, 204 of the 260 motif variants share motif. Shared motif is a popular feature of the silk protein gene, where MaSp-g has the largest number of contributed motifs.

The authors also found the second-order repeat structure in repeat motifs, defining the segment of the unique motif variants that appear 2 to 4 times in a row in the filament protein gene as cassette, and found 506 different types of cassettes in 20 full filament proteins, which spanned 25-95% of the motif region, a total of 1440 times, and half of the high-frequency cassettes were motifs in the same motifs group The concatenation of variants is repeated, however the other cassettes are collections of different motifs. Exploring the degree of sharing between silk protein genes, 95% of cassettes are unique to certain genes, unlike motif, cassette is mainly shared between the two categories of minor ampullate and major ampullate, these observation tables show that there are shared motifs between silk protein genes, but these motifs make up their own cassette, which in turn produces different gene functions. This further leads to different physical properties of spider silk.

Spider genome research reveals the scientific problems of spider silk gene diversity and its complex expression pattern background knowledge assembly results research

2. Study on the expression of filament protein genes in different tissues

Previous studies have shown that filament protein expression is not glandular specificity, in order to figure out the gene expression pattern of filament protein, the authors in 3 adult female spiders in different glands of 28 filament protein quantification, found that transcripts in the same filament gland belong to different filament protein classes, as described earlier, filamentin gene in its corresponding morphological gland expressed the highest amount, but there are also some genes that are not expressed in the corresponding glands, some are expressed in all glands, It is worth noting that the filament protein is named according to which gland it was first cloned to obtain does not represent the specificity of expression, the study found that tubular filament protein is expressed in multiple glands, and is not the highest expression in the tubular gland, it may be that the female individual is not in the stage of producing eggshell protein. The expression of filament protein genes in multiple glands reflects the specificity of glands, which also indicates their role, and some unknown filament protein gene expression patterns have been found to be the same as known genes, and the function of these genes can be hypothesized accordingly.

It is generally believed that the filament protein gene is expressed only in the silk glands, but this is not the case with the new FLAG-b, which is the most expressed in the toxic glands, and there are also findings in the RNA-seq results that support this phenomenon, PR-1 (a known venom) and FLAG-a (a known whiplash filament protein) are expressed in the identified tissues, normalizing the expression of the FLAG-b gene with their control, and found that it is expressed in the toxic glands in 1000 to 5000 times that of the silk glands.

3. Extreme silk protein diversity and evolutionary origins

The phenomenon of multiple exons of the filament protein gene causing variable shearing is a conjecture of filament protein diversity, and although no previous experiments have been used to verify the variant, the authors found that flag-b's variable reads support its division into two types: the primary full-length isoform and the secondary type that lacks a second exon, considering that the second exon in the full-length isoform is used to encode the C-terminal domain prompted FLAG-b to be more inclined to produce a second type.

In order to identify the potential of non-filament protein genes in the production of filaments, 649 candidate genes were selected from genes with high expression or specific expression in the silk glands by transcriptome data, 183 of which are highly homologous to known filamentin transcripts, including enzymes such as kinases, proteases, dehydrogenases and acetyltransferases, etc., most of which are in eukaryotes analysis systems, and the authors hope to find some gene transcription of proteins that convert liquid filaments into solid filaments, such as enzymes that maintain the ph gradient. Candidate genes that can generate ion-assisted maintenance of the PH gradient include 3 carbohydrase orthogene homologs, 4 thyroid peroxidase paralogic genes, and 5 chorioperoxygenase paralogic genes.

The authors found two evolutionary mechanisms that contribute to the diversity of filament protein loci, one is the tandem-duplication event, and the other is that the filament protein gene has more than 2 times more sequence polymorphism than the gene in the domain regions of the C and N termins, which is evidence that the filament protein loci have long been selected by equilibrium.

Summary of the discussion

1. 28 silk proteins were identified through genome sequencing, representing all the silk protein classifications of the golden spider family, and 8 new silk proteins were found and a large number of new repeat sequences were provided.

2. Through RNA-seq, the expression pattern of filament protein genes in glands is revealed, some filament protein genes are specifically expressed in a gland, and some filament proteins are transcribed in various glands, and some filament protein expression is not the highest in the correspondingly named glands.

3. Tandem gene duplication and high levels of polymorphisms suggest that silk proteins are the result of natural selection and are used to maintain diversity.

4. Within and between different silk protein classes, 66% of the repeated motif variants are shared, but 95% of cassettes are exclusive to a certain type of silk protein, and the shared motifs form different cassettes, and then form a longer repeat sequence, which in turn forms the unique physical properties of different filament protein classes.

Editor's testimonials

When we complete genome sequencing, we can not rely on large-scale sample research, which is easy to cause wide and inaccurate. You can focus on some of the genes and functions unique to this species, focus on research, do fine experiments, and dig deeper into the data, which will also have good output.

bibliography:

Babb P L, Lahens N F, Correa-Garhwal S M, et al. The Nephila clavipes genome highlights the diversity of spider silk genes and their complex expression[J]. Nature Genetics, 2017.

Read on