Automatic Scanning of the PDB Databank
As the number of high-throughput computational methods increases, and PyFREC provides means for quick screening of excited state resonances, electronic couplings, and quantum dynamics simulations, it is convenient to have a tool for automatic extraction of structural information. The Protein Data Bank (PDB)19 provides a convenient interface for such operations. Employingurllib2 ,35 PyFREC automatically downloads and parses PDB files based on a user-provided PDB ID list. PyFREC then analyzes the downloaded PDB structures (e.g., identifies chlorophyll pigments inside PDB files) in order to compute electronic couplings between the selected fragments. Currently, the identification of pigments is based on chemical structure and topology of chemical bonds (e.g., the central Mg atom surrounded by nitrogen and oxygen atoms at particular distances).2 In the future, machine-learning algorithms (see below) will be used for this analysis.
Processing of multiple molecular structures (e.g., proteins from the PDB databank) produces datasets that can be interpreted and analyzed using the network (graph) theory. For example, electronic couplings or orientation factors that characterize interactions between pigments and affect the exciton energy transfer can be rationalized in terms of network theory. PyFREC employs NetworkXlibrary36 to generate and analyze networks. Various properties of the network are computed, including average shortest path length, average clustering coefficient, and current-flow closeness centrality.