针对传统序列模式挖掘(SPM)不考虑模式重复性且忽略各项的效用(单价或利润)与模式长度对用户兴趣度影响的问题,提出一次性条件下top-k高平均效用序列模式挖掘(TOUP)算法。TOUP算法主要包括两个核心步骤:平均效用计算和候选模式生成。首...针对传统序列模式挖掘(SPM)不考虑模式重复性且忽略各项的效用(单价或利润)与模式长度对用户兴趣度影响的问题,提出一次性条件下top-k高平均效用序列模式挖掘(TOUP)算法。TOUP算法主要包括两个核心步骤:平均效用计算和候选模式生成。首先,提出基于各项出现位置与项重复关系数组的CSP(Calculation Support of Pattern)算法计算模式支持度,从而实现模式平均效用的快速计算;其次,采用项集扩展和序列扩展生成候选模式,并提出了最大平均效用上界,基于该上界实现对候选模式的有效剪枝。在5个真实数据集和1个合成数据集上的实验结果表明,相较于TOUP-dfs和HAOP-ms算法,TOUP算法的候选模式数分别降低了38.5%~99.8%和0.9%~77.6%;运行时间分别降低了33.6%~97.1%和57.9%~97.2%。TOUP的算法性能更优,能更高效地挖掘用户感兴趣的模式。展开更多
How to deal with the imprecise information retrieval has become more and more important in the present information society. An efficient and effective method of information retrieval based on multi tuple rough set is...How to deal with the imprecise information retrieval has become more and more important in the present information society. An efficient and effective method of information retrieval based on multi tuple rough set is discussed in this paper. The new approach is considered as a generalization of the original rough set model for flexible information retrieval. The imprecise query results can be obtained by multi tuple approximations.展开更多
ncRNA和mRNA一样,都是重要的功能分子。以k-tuple(k字)含量为特征,对酵母ncRNA成熟序列和mRNA的编码区、上游序列与下游序列进行了分类与比较研究,结果显示:基于ncRNA成熟序列与mRNA编码区的3-tuple的含量,ncRNA和mRNA的交叉有效性分类...ncRNA和mRNA一样,都是重要的功能分子。以k-tuple(k字)含量为特征,对酵母ncRNA成熟序列和mRNA的编码区、上游序列与下游序列进行了分类与比较研究,结果显示:基于ncRNA成熟序列与mRNA编码区的3-tuple的含量,ncRNA和mRNA的交叉有效性分类精度(leave-one out cross-validation,LOOCV)平均值达到93.93%;基于上游序列4-tuple和5-tuple的含量,分类精度分别为92.49%和92.76%;基于下游序列4-tuple和5-tuple的含量,分类精度分别为91.58%和90.60%;利用上游序列和下游序列的4-tuple与5-tuple的含量,其平均分类精度分别为94.68%和94.83%;通过t检验,得到了在ncRNA和mRNA上、下游序列中具有显著统计学差异的k-tuple。上述结果表明,基于ncRNA成熟序列与mRNA编码区的3-tuple含量和基于ncRNA与mRNA上、下游序列的4或5-tuple含量可以有效地区分ncRNA与mRNA。此研究结果不仅有助于准确识别ncRNA与mRNA,还有助于发现ncRNA特异的转录因子结合位点。展开更多
This study proposes a multiple attribute group decisionmaking(MAGDM)approach on the basis of the plant growth simulation algorithm(PGSA)and interval 2-tuple weighted average operators for uncertain linguistic weighted...This study proposes a multiple attribute group decisionmaking(MAGDM)approach on the basis of the plant growth simulation algorithm(PGSA)and interval 2-tuple weighted average operators for uncertain linguistic weighted aggregation(ULWA).We provide an example for illustration and verification and compare several aggregation operators to indicate the optimality of the assembly method.In addition,we present two comparisons to demonstrate the practicality and effectiveness of the proposed method.The method can be used not only to aggregate MAGDM problems but also to solve multi-granularity uncertain linguistic information.Its high reliability,easy programming,and high-speed calculation can improve the efficiency of ULWA characteristics.Finally,the proposed method has the exact characteristics for linguistic information processing and can effectively avoid information distortion and loss.展开更多
文摘针对传统序列模式挖掘(SPM)不考虑模式重复性且忽略各项的效用(单价或利润)与模式长度对用户兴趣度影响的问题,提出一次性条件下top-k高平均效用序列模式挖掘(TOUP)算法。TOUP算法主要包括两个核心步骤:平均效用计算和候选模式生成。首先,提出基于各项出现位置与项重复关系数组的CSP(Calculation Support of Pattern)算法计算模式支持度,从而实现模式平均效用的快速计算;其次,采用项集扩展和序列扩展生成候选模式,并提出了最大平均效用上界,基于该上界实现对候选模式的有效剪枝。在5个真实数据集和1个合成数据集上的实验结果表明,相较于TOUP-dfs和HAOP-ms算法,TOUP算法的候选模式数分别降低了38.5%~99.8%和0.9%~77.6%;运行时间分别降低了33.6%~97.1%和57.9%~97.2%。TOUP的算法性能更优,能更高效地挖掘用户感兴趣的模式。
文摘How to deal with the imprecise information retrieval has become more and more important in the present information society. An efficient and effective method of information retrieval based on multi tuple rough set is discussed in this paper. The new approach is considered as a generalization of the original rough set model for flexible information retrieval. The imprecise query results can be obtained by multi tuple approximations.
文摘ncRNA和mRNA一样,都是重要的功能分子。以k-tuple(k字)含量为特征,对酵母ncRNA成熟序列和mRNA的编码区、上游序列与下游序列进行了分类与比较研究,结果显示:基于ncRNA成熟序列与mRNA编码区的3-tuple的含量,ncRNA和mRNA的交叉有效性分类精度(leave-one out cross-validation,LOOCV)平均值达到93.93%;基于上游序列4-tuple和5-tuple的含量,分类精度分别为92.49%和92.76%;基于下游序列4-tuple和5-tuple的含量,分类精度分别为91.58%和90.60%;利用上游序列和下游序列的4-tuple与5-tuple的含量,其平均分类精度分别为94.68%和94.83%;通过t检验,得到了在ncRNA和mRNA上、下游序列中具有显著统计学差异的k-tuple。上述结果表明,基于ncRNA成熟序列与mRNA编码区的3-tuple含量和基于ncRNA与mRNA上、下游序列的4或5-tuple含量可以有效地区分ncRNA与mRNA。此研究结果不仅有助于准确识别ncRNA与mRNA,还有助于发现ncRNA特异的转录因子结合位点。
基金supported by the National Natural Science Foundation of China(71771118 71471083)+1 种基金the Ministry of Education Humanities and Social Sciences Foundation of China(18YJCZH146)the Nanjing University Double First-Class project
文摘This study proposes a multiple attribute group decisionmaking(MAGDM)approach on the basis of the plant growth simulation algorithm(PGSA)and interval 2-tuple weighted average operators for uncertain linguistic weighted aggregation(ULWA).We provide an example for illustration and verification and compare several aggregation operators to indicate the optimality of the assembly method.In addition,we present two comparisons to demonstrate the practicality and effectiveness of the proposed method.The method can be used not only to aggregate MAGDM problems but also to solve multi-granularity uncertain linguistic information.Its high reliability,easy programming,and high-speed calculation can improve the efficiency of ULWA characteristics.Finally,the proposed method has the exact characteristics for linguistic information processing and can effectively avoid information distortion and loss.