pandas - 迭代添加新字符串作为列表中的元素

67 3

我正在使用一个 data frame,该data frame由数字格式的列组成:[[45, 45, 'D'],[46, 49, 'C'],[50, 66, 'S'],[67, 101, 'C'],[102, 103, 'S'],[104, 106, 'C'],[107, 108, 'S'],[109, 120, 'C'],[121, 121, 'S'],[122, 123, 'C'],[124, 140, 'S'],[141, 149, 'C'],[150, 176, 'S'],[177, 178, 'C'],[179, 181, 'S'],[182, 194, 'C'],[195, 213, 'S'],[214, 21``7, 'C']]

这些数字对应于字符串中字符的位置: 换句话说,字符串: 'MGILSFLPVLATESDWADCKSPQPWGHMLLWTAVLFLAPVAGTPAAPPKAVLKLEPQWINVLQEDSVTLTCRGTHSPESDSIQWFHNGNLIPTHTQPSYRFKANNNDSGEYTCQTGQTSLSDPVHLTVLSEWLVLQTPHLEFQEGETIVLRCHSWKDKPLVKVTFFQNGKSKKFSRSDPNFSIPQANHSHSGDYHCTGNIGYTLYSSKPVTITVQAPSSSPMGIIVAVVTGIAVAAIVAAVVALIYCRKKRISALPGYPECREMGETLPEKPANPTNPDEADKVGAENTITYSLLMHPDALEEPDDQNRI'

如你所见,列表中的某些字符与数字列表中的数字不对应(例如,缺少044 ),因此,必须删除第044位的字符以创建较短的字母序列。

这是为一行执行的代码:


new_s = ''



for item in res:


 new_s += strSeq[item[0]-1:item[1]]



print(len(new_s), new_s)



这是我试图为所有行获取的内容:


shortenedSeq_list =[] 


counter=0


stringstring=[]


for rows in df.itertuples():


 strSeq2 = [rows.sequence]


 strremove2 = [rows.shortened_mobidb_consensus]


 for item in strremove2:


 res = ast.literal_eval(item)


 for item in res:


 stringstring.append(strSeq2[item[0]-1:item[1]])



stringstring



但这将导致输出:


 [],


 [],


 [],


 [],


 [],


 [],


 [],


 [],


 [],


 ['MGKGKPRGLNSARKLRVHRRNNRWAETTYKKRLLGTAFKSSPFGGSSHAKGIVLEKIGIESKQPNSAIRKCVRVQLIKNGKKVTAFVPNDGCLNFVDENDEVLLAGFGRKGKAKGDIPGVRFKVVKVSGVSLLALWKEKKEKPRS'],


 [],


 [],



而我希望列表中的每一行都是缩短的序列。

我最终想将此列表作为dataframe中的一列添加。

更新

数字输出为字符串而不是列表,因此res是列表,这是工作代码输出:

173 AAPPKAVLKLEPQWINVLQEDSVTLTCRGTHSPESDSIQWFHNGNLIPTHTQPSYRFKANNNDSGEYTCQTGQTSLSDPVHLTVLSEWLVLQTPHLEFQEGETIVLRCHSWKDKPLVKVTFFQNGKSKKFSRSDPNFSIPQANHSHSGDYHCTGNIGYTLYSSKPVTITVQAP其中173是缩短序列的长度,后接序列。

df例子:


shortened_mobidb_consensus sequence


0 [[45, 45, 'D'], [46, 49, 'C'], [50, 66, 'S'], [67, 101, 'C'], [102, 103, 'S'], [104, 106, 'C'], [107, 108, 'S'], [109, 120, 'C'], [121, 121, 'S'], [122, 123, 'C'], [124, 140, 'S'], [141, 149, 'C'], [150, 176, 'S'], [177, 178, 'C'], [179, 181, 'S'], [182, 194, 'C'], [195, 213, 'S'], [214, 217, 'C']] MGILSFLPVLATESDWADCKSPQPWGHMLLWTAVLFLAPVAGTPAAPPKAVLKLEPQWINVLQEDSVTLTCRGTHSPESDSIQWFHNGNLIPTHTQPSYRFKANNNDSGEYTCQTGQTSLSDPVHLTVLSEWLVLQTPHLEFQEGETIVLRCHSWKDKPLVKVTFFQNGKSKKFSRSDPNFSIPQANHSHSGDYHCTGNIGYTLYSSKPVTITVQAPSSSPMGIIVAVVTGIAVAAIVAAVVALIYCRKKRISALPGYPECREMGETLPEKPANPTNPDEADKVGAENTITYSLLMHPDALEEPDDQNRI


1 [[1, 1, 'D'], [2, 143, 'S'], [144, 145, 'C']] MGKGKPRGLNSARKLRVHRRNNRWAETTYKKRLLGTAFKSSPFGGSSHAKGIVLEKIGIESKQPNSAIRKCVRVQLIKNGKKVTAFVPNDGCLNFVDENDEVLLAGFGRKGKAKGDIPGVRFKVVKVSGVSLLALWKEKKEKPRS


2 [[1, 145, 'S']] MGKGKPRGLNSARKLRVHRRNNRWAETTYKKRLLGTAFKSSPFGGSSHAKGIVLEKIGIESKQPNSAIRKCVRVQLIKNGKKVTAFVPNDGCLNFVDENDEVLLAGFGRKGKAKGDIPGVRFKVVKVSGVSLLALWKEKKEKPRS


3 [[1, 1, 'D'], [2, 2, 'C'], [3, 37, 'S'], [38, 39, 'C'], [40, 40, 'S'], [41, 41, 'C'], [42, 62, 'S'], [63, 65, 'C'], [66, 231, 'S']] MSKNILVLGGSGALGAEVVKFFKSKSWNTISIDFRENPNADHSFTIKDSGEEEIKSVIEKINSKSIKVDTFVCAAGGWSGGNASSDEFLKSVKGMIDMNLYSAFASAHIGAKLLNQGGLFVLTGASAALNRTSGMIAYGATKAATHHIIKDLASENGGLPAGSTSLGILPVTLDTPTNRKYMSDANFDDWTPLSEVAEKLFEWSTNSDSRPTNGSLVKFETKSKVTTWTNL


4 [[24, 29, 'D'], [30, 91, 'S'], [92, 92, 'D']] MKVSTTALAVLLCTMTLCNQVFSAPYGADTPTACCFSYSRKIPRQFIVDYFETSSLCSQPGVIFLTKRNRQICADSKETWVQEYITDLELNA



时间: 原作者:

56 1

我猜它应该是:


shortenedSeq_list =[] 


counter=0


stringstring=[]


for rows in df.itertuples():


 strSeq2 = [rows.sequence]


 strremove2 = [rows.shortened_mobidb_consensus]


 for item in strremove2:


 res = ast.literal_eval(item)


 for item in res:


 stringstring.append(strSeq2[item[0]-1:item[1]])



stringstring



原作者:
...