Saccharolobus islandicus REY15A represents one of the very few archaeal models with versatile genetic tools,which include efficient genome editing,gene silencing,and robust protein expression systems.However,plasmid v...Saccharolobus islandicus REY15A represents one of the very few archaeal models with versatile genetic tools,which include efficient genome editing,gene silencing,and robust protein expression systems.However,plasmid vectors constructed for this crenarchaeon thus far are based solely on the pRN2 cryptic plasmid.Although this plasmid coexists with pRN1 in its original host,early attempts to test pRN1-based vectors consistently failed to yield any stable host-vector system for Sa.islandicus.We hypothesized that this failure could be due to the occurrence of CRISPR immunity against pRN1 in this archaeon.We identified a putative target sequence in orf904 encoding a putative replicase on pRN1(target N1).Mutated targets(N1a,N1b,and N1c)were then designed and tested for their capability to escape the host CRISPR immunity by using a plasmid inter-ference assay.The results revealed that the original target triggered CRISPR immunity in this archaeon,whereas all three mutated targets did not,indicating that all the designed target mutations evaded host immunity.These mutated targets were then incorporated into orf904 individually,yielding corresponding mutated pRN1 backbones with which shuttle plasmids were constructed(pN1aSD,pN1bSD,and pN1cSD).Sa.islandicus transformation revealed that pN1aSD and pN1bSD were functional shuttle vectors,but pN1cSD lost the capability for replication.These results indicate that the missense mutations in the conserved helicase domain in pN1c inactivated the replicase.We further showed that pRN1-based and pRN2-based vectors were stably maintained in the archaeal cells either alone or in combination,and this yielded a dual plasmid system for genetic study with this important archaeal model.展开更多
CRISPR-Cas systems provide the small RNA-based adaptive immunity to defend against invasive genetic elements in archaea and bacteria.Organisms of Sulfolobales,an order of thermophilic acidophiles belonging to the Cren...CRISPR-Cas systems provide the small RNA-based adaptive immunity to defend against invasive genetic elements in archaea and bacteria.Organisms of Sulfolobales,an order of thermophilic acidophiles belonging to the Crenarchaeotal Phylum,usually contain both type I and typeⅢCRISPR-Cas systems.Two species,Saccharolobus solfataricus and Sulfolobus islandicus,have been important models for CRISPR study in archaea,and knowledge obtained from these studies has greatly expanded our understanding of molecular mechanisms of antiviral defense in all three steps:adaptation,expression and crRNA processing,and interference.Four subtypes of CRISPR-Cas systems are common in these organisms,including I-A,I-D,Ⅲ-B,andⅢ-D.These cas genes form functional modules,e.g.,all genes required for adaptation and for interference in the I-A immune system are clustered together to form aCas and i Cas modules.Genetic assays have been developed to study mechanisms of adaptation and interference by different CRISPR-Cas systems in these model archaea,and these methodologies are useful in demonstration of the protospacer-adjacent motif(PAM)-dependent DNA interference by I-A interference modules and multiple interference activities byⅢ-B Cmr systems.Ribonucleoprotein effector complexes have been isolated for SulfolobalesⅢ-B andⅢ-D systems,and their biochemical characterization has greatly enriched the knowledge of molecular mechanisms of these novel antiviral immune responses.展开更多
Entity resolution (ER) is the problem of identi- fying and grouping different manifestations of the same real world object. Algorithmic approaches have been developed where most tasks offer superior performance unde...Entity resolution (ER) is the problem of identi- fying and grouping different manifestations of the same real world object. Algorithmic approaches have been developed where most tasks offer superior performance under super- vised learning. However, the prohibitive cost of labeling training data is still a huge obstacle for detecting duplicate query records from online sources. Furthermore, the unique combinations of noisy data with missing elements make ER tasks more challenging. To address this, transfer learning has been adopted to adaptively share learned common structures of similarity scoring problems between multiple sources. Al- though such techniques reduce the labeling cost so that it is linear with respect to the number of sources, its random sam- piing strategy is not successful enough to handle the ordinary sample imbalance problem. In this paper, we present a novel multi-source active transfer learning framework to jointly select fewer data instances from all sources to train classi- fiers with constant precision/recall. The intuition behind our approach is to actively label the most informative samples while adaptively transferring collective knowledge between sources. In this way, the classifiers that are learned can be both label-economical and flexible even for imbalanced or quality diverse sources. We compare our method with the state-of-the-art approaches on real-word datasets. Our exper- imental results demonstrate that our active transfer learning algorithm can achieve impressive performance with far fewerlabeled samples for record matching with numerous and var- ied sources.展开更多
Extant giant pandas are divided into Sichuan and Qinling subspecies.The giant panda has many speciesspecific characteristics,including comparatively small organs for body size,small genitalia of male individuals,and l...Extant giant pandas are divided into Sichuan and Qinling subspecies.The giant panda has many speciesspecific characteristics,including comparatively small organs for body size,small genitalia of male individuals,and low reproduction.Here,we report the most contiguous,high-quality chromosomelevel genomes of two extant giant panda subspecies to date,with the first genome assembly of the Qinling subspecies.Compared with the previously assembled giant panda genomes based on short reads,our two assembled genomes increased contiguity over 200-fold at the contig level.Additional sequencing of 25 individuals dated the divergence of the Sichuan and Qinling subspecies into two distinct clusters from 10,000 to 12,000 years ago.Comparative genomic analyses identified the loss of regulatory elements in the dachshund family transcription factor 2(DACH2)gene and specific changes in the synaptotagmin 6(SYT6)gene,which may be responsible for the reduced fertility of the giant panda.Positive selection analysis between the two subspecies indicated that the reproduction-associated IQ motif containing D(IQCD)gene may at least partly explain the different reproduction rates of the two subspecies.Furthermore,several genes in the Hippo pathway exhibited signs of rapid evolution with giant panda-specific variants and divergent regulatory elements,which may contribute to the reduced inner organ sizes of the giant panda.展开更多
With the rapid development of location-based services, a particularly important aspect of start-up marketing research is to explore and characterize points of interest (PoIs) such as restaurants and hotels on maps. ...With the rapid development of location-based services, a particularly important aspect of start-up marketing research is to explore and characterize points of interest (PoIs) such as restaurants and hotels on maps. However, due to the lack of direct access to PoI databases, it is necessary to rely on existing APIs to query Pols within a region and calculate PoI statistics. Unfortunately, public APIs generally im- pose a limit on the maximum number of queries. Therefore, we propose effective and efficient sampling methods based on road networks to sample PoIs on maps and provide unbiased estimators for calculating PoI statistics. In general, the more intense the roads, the denser the distribution of PoIs is within a region. Experimental results show that compared with state-of-the-art methods, our sampling methods improve the efficiency of aggregate statistical estimations.展开更多
基金funded by the National Key R&D Program of China(Grant No.2020YFA0906800 to Q.S.)the National Natural Science Foundation of China(Nos.32270040 to Q.S.,32001022 to X.F.,and 32370033 to Y.S.).
文摘Saccharolobus islandicus REY15A represents one of the very few archaeal models with versatile genetic tools,which include efficient genome editing,gene silencing,and robust protein expression systems.However,plasmid vectors constructed for this crenarchaeon thus far are based solely on the pRN2 cryptic plasmid.Although this plasmid coexists with pRN1 in its original host,early attempts to test pRN1-based vectors consistently failed to yield any stable host-vector system for Sa.islandicus.We hypothesized that this failure could be due to the occurrence of CRISPR immunity against pRN1 in this archaeon.We identified a putative target sequence in orf904 encoding a putative replicase on pRN1(target N1).Mutated targets(N1a,N1b,and N1c)were then designed and tested for their capability to escape the host CRISPR immunity by using a plasmid inter-ference assay.The results revealed that the original target triggered CRISPR immunity in this archaeon,whereas all three mutated targets did not,indicating that all the designed target mutations evaded host immunity.These mutated targets were then incorporated into orf904 individually,yielding corresponding mutated pRN1 backbones with which shuttle plasmids were constructed(pN1aSD,pN1bSD,and pN1cSD).Sa.islandicus transformation revealed that pN1aSD and pN1bSD were functional shuttle vectors,but pN1cSD lost the capability for replication.These results indicate that the missense mutations in the conserved helicase domain in pN1c inactivated the replicase.We further showed that pRN1-based and pRN2-based vectors were stably maintained in the archaeal cells either alone or in combination,and this yielded a dual plasmid system for genetic study with this important archaeal model.
基金grants from the Chinese National Transgenic Science and Technology Program(2019ZX08010003 to QS)the National Natural Science Foundation of China(31771380 to QS)+1 种基金the Qingdao Applied Research Fund for postdoctoral researchers(62450079311107 to ZY)the State Key Laboratory of Microbial Technology and Shandong University。
文摘CRISPR-Cas systems provide the small RNA-based adaptive immunity to defend against invasive genetic elements in archaea and bacteria.Organisms of Sulfolobales,an order of thermophilic acidophiles belonging to the Crenarchaeotal Phylum,usually contain both type I and typeⅢCRISPR-Cas systems.Two species,Saccharolobus solfataricus and Sulfolobus islandicus,have been important models for CRISPR study in archaea,and knowledge obtained from these studies has greatly expanded our understanding of molecular mechanisms of antiviral defense in all three steps:adaptation,expression and crRNA processing,and interference.Four subtypes of CRISPR-Cas systems are common in these organisms,including I-A,I-D,Ⅲ-B,andⅢ-D.These cas genes form functional modules,e.g.,all genes required for adaptation and for interference in the I-A immune system are clustered together to form aCas and i Cas modules.Genetic assays have been developed to study mechanisms of adaptation and interference by different CRISPR-Cas systems in these model archaea,and these methodologies are useful in demonstration of the protospacer-adjacent motif(PAM)-dependent DNA interference by I-A interference modules and multiple interference activities byⅢ-B Cmr systems.Ribonucleoprotein effector complexes have been isolated for SulfolobalesⅢ-B andⅢ-D systems,and their biochemical characterization has greatly enriched the knowledge of molecular mechanisms of these novel antiviral immune responses.
文摘Entity resolution (ER) is the problem of identi- fying and grouping different manifestations of the same real world object. Algorithmic approaches have been developed where most tasks offer superior performance under super- vised learning. However, the prohibitive cost of labeling training data is still a huge obstacle for detecting duplicate query records from online sources. Furthermore, the unique combinations of noisy data with missing elements make ER tasks more challenging. To address this, transfer learning has been adopted to adaptively share learned common structures of similarity scoring problems between multiple sources. Al- though such techniques reduce the labeling cost so that it is linear with respect to the number of sources, its random sam- piing strategy is not successful enough to handle the ordinary sample imbalance problem. In this paper, we present a novel multi-source active transfer learning framework to jointly select fewer data instances from all sources to train classi- fiers with constant precision/recall. The intuition behind our approach is to actively label the most informative samples while adaptively transferring collective knowledge between sources. In this way, the classifiers that are learned can be both label-economical and flexible even for imbalanced or quality diverse sources. We compare our method with the state-of-the-art approaches on real-word datasets. Our exper- imental results demonstrate that our active transfer learning algorithm can achieve impressive performance with far fewerlabeled samples for record matching with numerous and var- ied sources.
基金supported by the National Key Program(2016YFC0503200)from the Ministry of Science and Technology of Chinaa special grant for the giant panda from the State Forestry Administration of the People’s Republic of China+2 种基金the Fundamental Research Funds for the Central Universities of the People’s Republic of Chinathe Foundation of Key Laboratory of State Forestry and Grassland Administration(State Park Administration)on Conservation Biology of Rare Animals in the Giant Panda National Park(KLSFGAGP2020.002)the Guangdong Provincial Key Laboratory of Genome Read and Write(2017B030301011)。
文摘Extant giant pandas are divided into Sichuan and Qinling subspecies.The giant panda has many speciesspecific characteristics,including comparatively small organs for body size,small genitalia of male individuals,and low reproduction.Here,we report the most contiguous,high-quality chromosomelevel genomes of two extant giant panda subspecies to date,with the first genome assembly of the Qinling subspecies.Compared with the previously assembled giant panda genomes based on short reads,our two assembled genomes increased contiguity over 200-fold at the contig level.Additional sequencing of 25 individuals dated the divergence of the Sichuan and Qinling subspecies into two distinct clusters from 10,000 to 12,000 years ago.Comparative genomic analyses identified the loss of regulatory elements in the dachshund family transcription factor 2(DACH2)gene and specific changes in the synaptotagmin 6(SYT6)gene,which may be responsible for the reduced fertility of the giant panda.Positive selection analysis between the two subspecies indicated that the reproduction-associated IQ motif containing D(IQCD)gene may at least partly explain the different reproduction rates of the two subspecies.Furthermore,several genes in the Hippo pathway exhibited signs of rapid evolution with giant panda-specific variants and divergent regulatory elements,which may contribute to the reduced inner organ sizes of the giant panda.
基金This work was partially supported by the National Natural Science Foundation of China (NSFC) (Grant N os. 61170020, 61402311, 61440053), and the US National Science Foundation (IIS- 1115417).
文摘With the rapid development of location-based services, a particularly important aspect of start-up marketing research is to explore and characterize points of interest (PoIs) such as restaurants and hotels on maps. However, due to the lack of direct access to PoI databases, it is necessary to rely on existing APIs to query Pols within a region and calculate PoI statistics. Unfortunately, public APIs generally im- pose a limit on the maximum number of queries. Therefore, we propose effective and efficient sampling methods based on road networks to sample PoIs on maps and provide unbiased estimators for calculating PoI statistics. In general, the more intense the roads, the denser the distribution of PoIs is within a region. Experimental results show that compared with state-of-the-art methods, our sampling methods improve the efficiency of aggregate statistical estimations.