欢迎关注我的CSDN:https://spike.blog.csdn/
本文地址:https://blog.csdn/caroline_wendy/article/details/129881370

蛋白质复合物或多肽链复合物是由两个或多个相关的多肽链组成的。蛋白质复合物是一种四级结构形式。蛋白质复合物中的蛋白质,通过非共价的蛋白质-蛋白质相互作用相连。这些复合物是许多生物过程的基石。通过相互接近,极大的提高复合物与底物之间的结合速度和选择性,从而提高细胞效率。许多进入细胞和分离蛋白质的技术本身,就会破坏这些大型复合物,使确定复合物组成成分的任务变得复杂。在稳定的复合物中,蛋白质之间的大型疏水界面通常埋藏超过2500平方埃的表面积。

8CZ5 抗体抗原复合物的结构,如下:

从复合物PDB中,提取抗体(Ab-HL)、重链(H)、轻链(L)、抗原(Ag)的结构,存储到不同的PDB文件中,相当于拆解PDB结构。

函数如下:

  • get_multiple_chains_pdb_from_complex:提取PDB的多链
  • get_single_chain_pdb_from_complex:提取PDB的单链
  • split_antibody_antigen_complex:只针对于抗原-抗体PDB的拆分

源码如下,部分来源于ChatGPT:

def get_multiple_chains_pdb_from_complex(input_pdb, chain_ids, output_pdb):
    """
    A function to extract multiple chains of pdb from an input complex pdb and combine them to one new complex pdb
    codes from ChatGPT
    """
    # Create a PDB parser object
    parser = PDB.PDBParser(QUIET=True)
    # Parse the input pdb file and get the structure object
    structure = parser.get_structure("input", input_pdb)
    # Create a PDB io object
    io = PDB.PDBIO()
    # Set the structure object to the io object
    io.set_structure(structure)
    # Create an empty list to store the selected chains
    selected_chains = []
    # Loop through the chain ids
    for chain_id in chain_ids:
        # Get the chain object from the structure object by the chain id
        chain = structure[0][chain_id]
        # Append the chain object to the selected list
        selected_chains.append(chain)

    # Create a select class that only accepts the selected chains
    class Select(PDB.Select):
        def accept_chain(self, chain_select):
            if chain_select in selected_chains:
                return True
            else:
                return False

    # Save the selected chains to the output pdb file using the select class
    io.save(output_pdb, Select())


def get_single_chain_pdb_from_complex(input_pdb, chain_id, output_pdb):
    """
    获取多链PDB的单链PDB,代码来源于ChatGPT
    :param input_pdb: 输入多链PDB路径
    :param chain_id: 单链名
    :param output_pdb: 输出单链PDB路径
    :return: None
    """
    parser = PDB.PDBParser(QUIET=True)
    try:
        structure = parser.get_structure("pdb", input_pdb)
    except IOError:
        raise Exception("Error: could not read pdb file")
    chain = None
    for model in structure:
        for c in model:
            if c.get_id() == chain_id:
                chain = c
                break
        if chain is not None:
            break
    if chain is None:
        raise Exception("Error: could not find chain with name", chain_id)

    io = PDB.PDBIO()

    class ChainSelect(PDB.Select):
        def accept_chain(self, c):
            return c == chain

    io.set_structure(structure)
    io.save(output_pdb, ChainSelect())


def split_antibody_antigen_complex(cls, input_pdb_path, pdb_name, chain_ids, out_dir):
    """
    split complex:
    1. save complex pdb.
    2. save antibody heavy chain pdb, light chain pdb and heavy - light chain complex pdb.
    3. save antigen chain pdb.
    all pdbs are saved into different folder.
    :param input_pdb_path: input complex pdb file
    :param pdb_name: as its name.
    :param chain_ids: Attention! the order is heavy light antigen.
    :param out_dir: output dir, function will make subdir in it.
    """
    assert len(chain_ids) == 3, \
        f"length of chain_ids must be 3, just as heavy light antigen! but input is {chain_ids}!"
    mkdir_if_not_exist(out_dir)
    pdb_out_dir = os.path.join(out_dir, pdb_name)
    mkdir_if_not_exist(pdb_out_dir)

    # input
    pdb_path = os.path.join(pdb_out_dir, f"{pdb_name}.pdb")
    shutil.copy(input_pdb_path, pdb_path)

    # complex
    ab_ag_path = os.path.join(pdb_out_dir, f"{pdb_name}_{''.join(chain_ids)}_ab_ag.pdb")
    ab_path = os.path.join(pdb_out_dir, f"{pdb_name}_{''.join(chain_ids[:2])}_ab.pdb")

    # single chain
    h_path = os.path.join(pdb_out_dir, f"{pdb_name}_{chain_ids[0]}_hc.pdb")
    l_path = os.path.join(pdb_out_dir, f"{pdb_name}_{chain_ids[1]}_lc.pdb")
    ag_path = os.path.join(pdb_out_dir, f"{pdb_name}_{chain_ids[2]}_ag.pdb")

    cls.get_multiple_chains_pdb_from_complex(pdb_path, chain_ids, ab_ag_path)
    cls.get_multiple_chains_pdb_from_complex(pdb_path, chain_ids[:2], ab_path)
    cls.get_single_chain_pdb_from_complex(pdb_path, chain_ids[0], h_path)
    cls.get_single_chain_pdb_from_complex(pdb_path, chain_ids[1], l_path)
    cls.get_single_chain_pdb_from_complex(pdb_path, chain_ids[2], ag_path)

更多推荐

AI制药 - 从蛋白质复合物PDB中提取单链或多链PDB