Preview

Bulletin of Siberian Medicine

Advanced search

Protein analysis capabilities in the NCBI bioinformation system

https://doi.org/10.20538/1682-0363-2025-4-194-203

Abstract

Aim. To review and summarize information about the features of protein data storage, as well as the possibilities for their analysis using NCBI tools.

The lecture summarizes data on existing repositories of protein sequences and structures, and analyzes the capabilities of bioinformatics tools for protein research on the NCBI platform (National Center for Biotechnology Information). The primary databases contain information about proteins (records) obtained through experimental studies; in addition, databases with supplementary information added by curators after analysis are also presented. Furthermore, bioinformatic analysis of protein sequences and structures using the tools discussed in this lecture enables the identification of phylogenetic features, as well as the prediction of functions and structures. Thus, the extraction of extensive information and its analysis through specialized services facilitate insights into in silico research of experimentally undetected protein characteristics, providing new knowledge that forms the basis for further investigations.

About the Author

N. Yu. Chasovskikh
Siberian State Medical University (SibSMU)
Russian Federation

2 Moskovsky trakt, 634050 Tomsk



References

1. Wheeler D.L., Barrett T., Benson D.A., Bryant S.H., Canese K., Chetvernin V. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2007;35:D5–D12. DOI: 10.1093/nar/gkl1031.

2. Sayers E.W., Beck J., Bolton E.E., Brister J.R., Chan J., Connor R. et al. Database resources of the National Center for Biotechnology Information in 2025. Nucleic Acids Res. 2025;53(D1):D20–D29. DOI: 10.1093/nar/gkae979.

3. Schuler G.D., Epstein J.A., Ohkawa H., Kans J.A. Entrez: molecular biology database and retrieval system. Methods Enzymol. 1996;266:141–162. DOI: 10.1016/s0076-6879(96)66012-1.

4. Часовских Н.Ю. Биоинформатика. М.: ГЭОТАР-Медиа, 2020:352. DOI: 10.33029/9704-5542-5-DIL-2020-1-352.

5. Mount D. Bioinformatics: sequence and genome analysis/ Cold Spring Harbor Laboratory Press: New York, 2004:692.

6. Polyanovsky V.O., Roytberg M.A., Tumanyan V.G. Comparative analysis of the quality of a global algorithm and a local algorithm for alignment of two sequences. Algorithms Mol. Biol. 2011;6(1):25. DOI: 10.1186/1748-7188-6-25.

7. Tatusov R.L., Koonin E.V., Lipman D.J. A genomic perspective on protein families. Science. 1997;278(5338):631–637. DOI: 10.1126/science.278.5338.631. PMID: 9381173.

8. Karsch-Mizrachi I., Arita M., Burdett T., Cochrane G., Nakamura Y., Pruitt K.D. et al. The international nucleotide sequence database collaboration (INSDC): enhancing global participation. Nucleic Acids Res. 2025;53(D1):D62–D66. DOI: 10.1093/nar/gkae1058.

9. Barrett T., Clark K., Gevorgyan R., Gorelenkov V., Gribov E., Karsch-Mizrachi I. et al. BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata. Nucleic Acids Res. 2012;40:D57–D63. DOI: 10.1093/nar/gkr1163.

10. Wang J., Chitsaz F., Derbyshire M.K., Gonzales N.R., Gwadz M., Lu S. et al. The conserved domain database in 2023. Nucleic Acids Res. 2022;51(D1):D384–D388. DOI: 10.1093/nar/gkac1096.

11. Marchler-Bauer A., Panchenko A.R., Shoemaker B.A., Thiessen P.A., Geer L.Y., Bryant S.H. CDD: a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Res. 2002;30(1):281–283. DOI: 10.1093/nar/30.1.281.

12. Marchler-Bauer A., Anderson J.B., Derbyshire M.K., DeWeese-Scott C., Gonzales N.R., Gwadz M. et al. CDD: a conserved domain database for interactive domain family analysis. Nucleic Acids Res. 2007;35:D237–240. DOI: 10.1093/nar/gkl951.

13. Marchler-Bauer A., Anderson J.B., Chitsaz F., Derbyshire M.K., DeWeese-Scott C., Fong J.H. et al. CDD: specific functional annotation with the Conserved Domain Database. Nucleic Acids Res. 2009;37:D205–D210. DOI: 10.1093/nar/gkn845.

14. Mistry J., Chuguransky S., Williams L., Qureshi M., Salazar G.A., Sonnhammer E.L.L. et al. Pfam: The protein families database in 2021. Nucleic Acids Res. 2021;49(D1):D412– D419. DOI: 10.1093/nar/gkaa913.

15. Letunic I., Khedkar S., Bork P. SMART: recent updates, new developments and status in 2020. Nucleic Acids Res. 2021;49(D1):D458–D460. DOI: 10.1093/nar/gkaa937.

16. Galperin M.Y., Vera Alvarez R., Karamycheva S., Makarova K.S., Wolf Y.I., Landsman D. COG database update 2024. Nucleic Acids Res. 2025;53(D1):D356–D363. DOI: 10.1093/nar/gkae983.

17. Haft D.H., Selengut J.D., Richter R.A., Harkins D., Basu M.K., Beck E. TIGRFAMs and Genome Properties in 2013. Nucleic Acids Res. 2013;41:D387–D395. DOI: 10.1093/nar/gks1234.

18. Wheeler D.L., Barrett T., Benson D.A., Bryant S.H., Canese K., Chetvernin V. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2008;36:D13–D 21. DOI: 10.1093/nar/gkm1000.

19. Brister J.R., Ako-Adjei D., Bao Y., Blinkova O. NCBI viral genomes resource. Nucleic Acids Res. 2015;43:D571–D 577. DOI: 10.1093/nar/gku1207.

20. Entrez Sequences Help [Internet]. Bethesda (MD): National Center for Biotechnology Information (US) 2010. URL: https://www.ncbi.nlm.nih.gov/books/NBK44864/

21. Lu S., Wang J., Chitsaz F., Derbyshire M.K., Geer R.C., Gonzales N.R. et al. CDD/SPARCLE: the conserved domain database in 2020. Nucleic Acids Res. 2020;48(D1):D265–D268. DOI: 10.1093/nar/gkz991.

22. Pruitt K., Brown G., Tatusova T., Maglott D. The Reference Sequence (RefSeq) Database. 2002 [Updated 2012]. In: McEntyre J., Ostell J., ed. The NCBI Handbook [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); Chapter 18. URL: https://www.ncbi.nlm.nih.gov/books/NBK21091/

23. Haft D.H., DiCuccio M., Badretdin A., Brover V., Chetvernin V., O’Neill K. et al. RefSeq: an update on prokaryotic genome annotation and curation. Nucleic Acids Res. 2018;46(D1):D851–D860. DOI: 10.1093/nar/gkx1068.

24. Altschul S.F., Madden T.L., Schäffer A.A., Zhang J., Zhang Z., Miller W. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–3402. DOI: 10.1093/nar/25.17.3389.

25. Wilbur W.J., Lipman D.J. Rapid similarity searches of nucleic acid and protein data banks. Proc. Natl. Acad. Sci. USA. 1983;80(3):726–730. DOI: 10.1073/pnas.80.3.726.

26. Rich D.H. Evaluation of enzyme inhibitors in drug discovery: a guide for medicinal chemists and pharmacologists. Clin. Chem. 2005;51:2219–2219. DOI: 10.1373/clinchem.2005.051946.

27. Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. Basic local alignment search tool. J. Mol. Biol. 1990;215:403– 410. DOI: 10.1016/S0022-2836(05)80360-2.

28. Ye J., McGinnis S., Madden T.L. BLAST: improvements for better sequence analysis. Nucleic Acids Res. 2006;34:W6– W9. DOI: 10.1093/nar/gkl164.

29. Xiao K., Zhai J., Feng Y., Zhou N., Zhang X., Zou J.-J. et al. Isolation of SARS-CoV-2-related coronavirus from Malayan pangolins. Nature. 2020;583:286. DOI: 10.1038/s41586-020-2313-x.

30. Wang H., Pipes L., Nielsen R. Synonymous mutations and the molecular evolution of SARS-Cov-2 origins. Virus Evol. 2021;7(1):veaa098. DOI: 10.1093/ve/veaa098.

31. La Rosa G., Mancini P., Bonanno F.G., Veneri C., Iaconelli M., Bonadonna L. et al. SARS-CoV-2 has been circulating in northern Italy since December 2019: Evidence from environmental monitoring. Sci. Total Environ. 2021;750:141711. DOI: 10.1016/J.SCITOTENV.2020.141711.

32. Sah R., Rodriguez-Morales A. J., Jha R., Chu D.K., Gu H., Peiris M. et al. Complete genome sequence of a 2019 novel coronavirus (SARS-CoV-2) strain isolated in Nepal. Microbiol. Resour. Announc. 2020;9:e00169–20. DOI: 10.1128/MRA.00169-20.

33. La Rosa G., Iaconelli M., Mancini P., Bonanno F.G., Veneri C., Bonadonna L. et al. First detection of SARS-CoV-2 in untreated wastewaters in Italy. Sci. Total Environ. 2020;736:139652. DOI: 10.1016/J.SCITOTENV.2020.139652.

34. Westhaus S., Weber F.-A., Schiwy S., Linnemann V., Brinkmann M., Widera M. et al. Detection of SARS-CoV-2 in raw and treated wastewater in Germany - Suitability for COVID-19 surveillance and potential transmission risks. Sci. Total Environ. 2021;751:141750. DOI: 10.1016/J.SCITOTENV.2020.141750.

35. Parmar M., Thumar R., Patel B., Athar M., Jha P.C., Patel D. Structural differences in 3C-like protease (Mpro) from SARSCoV and SARS-CoV-2: molecular insights revealed by Molecular Dynamics Simulations. Struct. Chem. 2022:1–18. DOI: 10.1007/s11224-022-02089-6.

36. Naderi Beni R., Elyasi-Ebli P., Gharaghani S., Seyedarabi A. In silico studies of anti-oxidative and hot temperament-based phytochemicals as natural inhibitors of SARS-CoV-2 Mpro. PLoS One. 2023;18(11):e0295014. DOI: 10.1371/journal. pone.0295014.

37. Papadopoulos J.S., Agarwala R. COBALT: constraint-based alignment tool for multiple protein sequences. Bioinformatics. 2007;23(9):1073–1079. DOI: 10.1093/bioinformatics/btm076.

38. Wang J., Youkharibache P., Zhang D., Lanczycki C.J., Geer R.C., Madej T. et al. iCn3D, a web-based 3D viewer for sharing 1D/2D/3D representations of biomolecular structures. Bioinformatics. 2020;36(1):131–135. DOI: 10.1093/bioinformatics/btz502.

39. Entrez Programming Utilities Help [Internet]. Bethesda (MD): National Center for Biotechnology Information (US), 2010. URL: https://www.ncbi.nlm.nih.gov/books/NBK25501/

40. Yin C. Genotyping coronavirus SARS-CoV-2: methods and implications. Genomics. 2020;112:3588–3596. DOI: 10.1016/j.ygeno.2020.04.016.

41. Li T., Liu D., Yang Y., Guo J., Feng Y., Zhang X. et al. Phylogenetic supertree reveals detailed evolution of SARSCoV-2. Sci. Rep. 2020;10:1–9. DOI: 10.1038/s41598-020-79484-8.

42. Bianchi M., Borsetti A., Ciccozzi M., Pascarella S. SARSCov-2 ORF3a: Mutability and function. Int. J. Biol. Macromol. 2021;170:820–826. DOI: 10.1016/j.ijbiomac.2020.12.142.

43. Wang R., Chen J., Hozumi Y., Yin C., Wei G.-W. Decoding asymptomatic COVID-19 infection and transmission. J. Phys. Chem. Lett. 2020;11:10007–10015. DOI: 10.1021/acs.jpclett.0c02765.

44. Wang R., Hozumi Y., Yin C., Wei G.-W. Decoding SARSCoV-2 Transmission and Evolution and Ramifications for COVID-19 Diagnosis, Vaccine, and Medicine. J. Chem. Inf. Model. 2020;60:5853. DOI: 10.1021/acs.jcim.0c00501.

45. Dallavilla T., Bertelli M., Morresi A., Bushati V., Stuppia L., Beccari T. et al. Bioinformatic analysis indicates that SARSCoV-2 is unrelated to known artificial coronaviruses. Eur. Rev. Med. Pharmacol Sci. 2020;24:4558–4564. DOI: 10.26355/eurrev_202004_21041.

46. Trigueiro-Louro J., Correia V., Figueiredo-Nunes I., Gíria M., Rebelo-de-Andrade H. Unlocking COVID therapeutic targets: A structure-based rationale against SARS-CoV-2, SARSCoV and MERS-CoV Spike. Comput Struct Biotechnol J. 2020;18:2117–2131. DOI: 10.1016/j.csbj.2020.07.017.


Review

For citations:


Chasovskikh N.Yu. Protein analysis capabilities in the NCBI bioinformation system. Bulletin of Siberian Medicine. 2025;24(4):194-203. (In Russ.) https://doi.org/10.20538/1682-0363-2025-4-194-203

Views: 62

JATS XML


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1682-0363 (Print)
ISSN 1819-3684 (Online)