Title:  How entangled proteins can be? Prediction and in vitro verification.
Abstract:  The presence of knots in protein 3D structures is well established [1]. However, as they are not common [2], only a fraction of such proteins is available in the PDB database. It was not possible to assess their im portance and versatility up until now because we did not have access to the whole proteome of an organ ism. The development of ecient machine learning methods for protein structure prediction, such as Al phaFold and RoseTTaFold, changed that [3].

First, we analyzed all proteins from human proteome (over 20 000) in search for knots and found them in less than 2% of the structures. Using a variety of methods, including homolog search, clustering, quality assessment and visual inspection, we determined the nature of each of the knotted structures and clas sifed it as either knotted, potentially knotted or an artifact and de-posited in a database available at: https://knotprot.cent.uw.edu.pl/alphafold. It turns out that over 75% of knots in human proteins may be artifacts. However, among potential knots we found knot 63 which would be the first knot in proteins with unknotting number higher than one [4].

Second, we focused on proteins which can be even more complex and possess two knots on a single chain (up to now, only single knots were found). For the first time, we searched dierent databases for double knotted proteins. Using AlphaFold we predicted a few families of doubly knotted proteins and studied in de tail their structure and function. Using experimental approach, we showed that such proteins can fold and perform their intended function 3, Fig. 1.

tRNA

FIG. 1. Predicted structure of CnTrmD-Tm1570 fusion protein based on AlphaFold and docking.
This homodimeric (second chain is transparent) complex binds tRNA (green) with its TrmD domains.

Finally, we established AlphaKnot [5], the first server to assess entanglement of AlphaFold-solved protein models with regard to thier pLDDT data. The server has two main functionalities. One is a database of structures from all of 21 full proteomes solved by Al phaFold which have been published up to 2022. Second is a user-friendly web server for researchers to analyze their own AlphaFold predictions. By using pLDDT confidence score, we classified predictions into categories which allow for detailed analysis, whether the protein model is correctly solved. This allowed us to discover new types of knot in the human proteome [5]. By cross-validating AlphaFold predictions with our server and RoseTTa predictions, we showed that AlphaFold, while overall a great tool, can have prob lems with correctly modeling knot topology of pro teins. We show examples of AlphaFold models with wrongly predicted topology as well as give possible explanations of such occurrences.

This work was supported by the National Science Centre #UMO-2018/31/B/NZ1/04016 (to JIS) and COST EUTOPIA action.

[1] J.I. Sulkowska, On folding of entangled proteins: knots, lassos, links and theta-curves, Current Opinion in Structural Biology 60:131-141 (2020)
[2] M. Jamroz et al., KnotProt: a database of proteins with knots and slipknots, Nucleic Acids Res., 43: D306-D314 (2014).
[3] J. Jumper et al., Highly accurate protein structure prediction with AlphaFold. Nature, 596.7873: 583-589 (2021).
[4] A. Perlinska et al., New 63 knot and other knots in human proteome from AlphaFold predictions [https://doi.org/10.1101/2021.12.30.474018]
[5] W. Niemyska, et al., AlphaKnot: Server to analyze entanglement in structures predicted by AlphaFold methods, Nucleic Acids Research [doi:10.1093/nar/gkn000].