Each protein chain in training and testing dataset is represented using 4 lines:
Line 1: >protein ID: the protein identifier used in DisProt
Line 2: protein sequence (1-letter amino acid encoding)
Line 3: The protein-binding annotations where 0 indicates non-protein-binding and 1 stands for protein-binding annotations
Line 4: The other-binding (including DNA-binding, RNA-binding, and small ligand-binding) annotations where 0 indicates non-other-binding and 1 stands for other-binding annotations