Zahra Narimani

Assistant Professor at Department of Computer Science and Information Technology

Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, Iran


Contact information

narimani iasbs.ac.ir

+98 24 3315 3374

Institute for Advanced Studies in Basic Sciences (IASBS), No. 444, Prof. Yousef Sobouti Blvd., Zanjan 45137-66731, Iran



The International Conference on Contemporary Issues In Data Science

Conference homepage

March 5-8, 2019

Institute for Advanced Studies in Basic Sciences (IASBS)


Zahra Narimani

Journal publications

A knowledge-based protein-protein interaction inhibition (KPI) pipeline: an insight from drug repositioning for COVID-19 inhibition.

Lanjanian H, Hosseini S, Narimani Z, Meknatkhah S, Riazi GH

Journal of Biomolecular Structure and Dynamics. 2022 Dec 27:1-4.

Full text Abstract

The inhibition of protein-protein interactions (PPIs) by small molecules is an exciting drug discovery strategy. Here, we aimed to develop a pipeline to identify candidate small molecules to inhibit PPIs. Therefore, KPI, a Knowledge-based Protein-Protein Interaction Inhibition pipeline, was introduced to improve the discovery of PPI inhibitors. Then, phytochemicals from a collection of known Middle Eastern antiviral herbs were screened to identify potential inhibitors of key PPIs involved in COVID-19. Here, the following investigations were sequenced: 1) Finding the binding partner and the interface of the proteins in PPIs, 2) Performing the blind ligand-protein inhibition (LPI) simulations, 3) Performing the local LPI simulations, 4) Simulating the interactions of the proteins and their binding partner in the presence and absence of the ligands, and 5) Performing the molecular dynamics simulations. The pharmacophore groups involved in the LPI were also characterized. Aloin, Genistein, Neoglucobrassicin, and Rutin are our new pipeline candidates for inhibiting PPIs involved in COVID-19. We also propose KPI for drug repositioning studies.

Ensemble learning from ensemble docking: revisiting the optimum ensemble size problem

Mohammadi S, Narimani Z, Ashouri M, Firouzi R, Karimi-Jafari MH

Scientific reports. 2022 Jan 10;12(1):1-5.

Full text Abstract

Despite considerable advances obtained by applying machine learning approaches in protein–ligand affinity predictions, the incorporation of receptor flexibility has remained an important bottleneck. While ensemble docking has been used widely as a solution to this problem, the optimum choice of receptor conformations is still an open question considering the issues related to the computational cost and false positive pose predictions. Here, a combination of ensemble learning and ensemble docking is suggested to rank different conformations of the target protein in light of their importance for the final accuracy of the model. Available X-ray structures of cyclin-dependent kinase 2 (CDK2) in complex with different ligands are used as an initial receptor ensemble, and its redundancy is removed through a graph-based redundancy removal, which is shown to be more efficient and less subjective than clustering-based representative selection methods. A set of ligands with available experimental affinity are docked to this nonredundant receptor ensemble, and the energetic features of the best scored poses are used in an ensemble learning procedure based on the random forest method. The importance of receptors is obtained through feature selection measures, and it is shown that a few of the most important conformations are sufficient to reach 1 kcal/mol accuracy in affinity prediction with considerable improvement of the early enrichment power of the models compared to the different ensemble docking without learning strategies. A clear strategy has been provided in which machine learning selects the most important experimental conformers of the receptor among a large set of protein–ligand complexes while simultaneously maintaining the final accuracy of affinity predictions at the highest level possible for available data. Our results could be informative for future attempts to design receptor-specific docking-rescoring strategies.

Importance of data preprocessing in time series prediction using sarima: a case study

Adineh, AH, Narimani Z, & Satapathy, SC

International Journal of Knowledge-Based and Intelligent Engineering Systems. 2020 Jan 1;24(4):331-42

Full text Abstract

Over last decades, time series data analysis has been in practice of specific importance. Different domains such as financial data analysis, analyzing biological data and speech recognition inherently deal with time dependent signals. Monitoring the past behavior of signals is a key for precise predicting the behavior of a system in near future. In scenarios such as financial data prediction, the predominant signal has a periodic behavior (starting from beginning of the month, week, etc.) and a general trend and seasonal behavior can also be assumed. Autoregressive Integrated Moving Average (ARIMA) model and its seasonal extension, SARIMA, have been widely used in forecasting time-series data, and are also capable of dealing with the seasonal behavior/trend in the data. Although the behavior of data may be autoregressive and trends and seasonality can be detected and handled by SARIMA, the data is not always exactly compatible with SARIMA (or more generally ARIMA) assumptions. In addition, the existence of missing data is not pre-assumed in SARIMA, while in real-world, there can be always missing data for different reasons such as holidays for which no data may be recorded. For different week days, different working hours may be a cause of observing irregular patterns compared to what is expected by SARIMA assumptions. In this paper, we investigate the effectiveness of applying SARIMA on such real-world data, and demonstrate preprocessing methods that can be applied in order to make the data more suitable to be modeled by SARIMA model. The data in the existing research is derived from transactions of a mutual fund investment company, which contains missing values (single point and intervals) and also irregularities as a result of the number of working hours per week days being different from each other which makes the data inconsistent leading to poor result without preprocessing. In addition, the number of data points was not adequate at the time of analysis in order to fit a SARIM model. Preprocessing steps such as filling missing values and tricks to make data consistent has been proposed to deal with existing problems. Results show that prediction performance of SARIMA on this set of real-world data is significantly improved by applying several preprocessing steps introduced in order to deal with mentioned circumstances. The proposed preprocessing steps can be used in other real-world time-series data analysis.

Network based identification of different mechanisms underlying pathogenesis of human papilloma virus-active and human papilloma virus-negative oropharyngeal squamous cell carcinoma

Iman M, Narimani Z, Hamraz I, Ansari E.

Journal of the Chinese Chemical Society. 2018 Nov;65(11):1307-16.

Full text Abstract

A different molecular mechanism underlying human papilloma virus (HPV)-negative and HPV-active pathogenesis is responsible for better response to therapies in HPV-associated oropharyngeal squamous cell carcinoma (OPSCC). In this study, we aim to provide an insight into molecular basis underlying this distinction and introduce possible targeted therapies for each phenotype. Using weighted gene co-expression network analysis (WGCNA), our aim was to identify not only differentially expressed genes but also significant coexpressed gene modules responsible for genotype and phenotype distinctions between HPV-active and HPV-negative samples. Recognizing differentially expressed genes in each module indicates key regulators that may be ignored in an analysis only based on differential gene expression study. Two modules are investigated in detail in our analysis, related to JAK-STAT dysregulation in HPV‐negative samples, and disruption of cell fate commitment possibly induced by overexpression of BCL2 is observed in the HPV-active cohort. The existence of differentially expressed oncogenes and potential miRNA role is investigated in our analysis. The other significant module related to keratinization, keratinocyte differentiation, and intermediate filament cytoskeleton organization was discovered in the resulting co-expression network. A considerable number of genes was downregulated in HPV-active samples in the relative module, postulating the impairment of cytoskeleton-related gene expression caused by HPV intervention.

Expectation propagation for large scale Bayesian inference of non-linear molecular networks from perturbation data

Narimani Z, Beigy H, Ahmad A, Masoudi-Nejad A, Fröhlich H

PloS one. 2017 Feb 6;12(2):e0171240.

Full text Abstract

Inferring the structure of molecular networks from time series protein or gene expression data provides valuable information about the complex biological processes of the cell. Causal network structure inference has been approached using different methods in the past. Most causal network inference techniques, such as Dynamic Bayesian Networks and ordinary differential equations, are limited by their computational complexity and thus make large scale inference infeasible. This is specifically true if a Bayesian framework is applied in order to deal with the unavoidable uncertainty about the correct model. We devise a novel Bayesian network reverse engineering approach using ordinary differential equations with the ability to include non-linearity. Besides modeling arbitrary, possibly combinatorial and time dependent perturbations with unknown targets, one of our main contributions is the use of Expectation Propagation, an algorithm for approximate Bayesian inference over large scale network structures in short computation time. We further explore the possibility of integrating prior knowledge into network inference. We evaluate the proposed model on DREAM4 and DREAM8 data and find it competitive against several state-of-the-art existing network inference methods.

Reconstruction of an Integrated Genome-Scale Co-Expression Network Reveals Key Modules Involved in Lung Adenocarcinoma.

Bidkhori G, Narimani Z, Hosseini Ashtiani S, Moeini A, Nowzari-Dalini A, Masoudi-Nejad A

PloS one. 2013 Jul 11;8(7):e67552.

Full text Abstract

Our goal of this study was to reconstruct a "genome-scale co-expression network' and find important modules in lung adenocarcinoma so that we could identify the genes involved in lung adenocarcinoma. We integrated gene mutation, GWAS, CGH, array-CGH and SNP array data in order to identify important genes and loci in genome-scale. Afterwards, on the basis of the identified genes a co-expression network was reconstructed from the co-expression data. The reconstructed network was named "genome-scale co-expression network". As the next step, 23 key modules were disclosed through clustering. In this study a number of genes have been identified for the first time to be implicated in lung adenocarcinoma by analyzing the modules. The genes EGFR, PIK3CA, TAF15, XIAP, VAPB, Appl1, Rab5a, ARF4, CLPTM1L, SP4, ZNF124, LPP, FOXP1, SOX18, MSX2, NFE2L2, SMARCC1, TRA2B, CBX3, PRPF6, ATP6V1C1, MYBBP1A, MACF1, GRM2, TBXA2R, PRKAR2A, PTK2, PGF and MYO10 are among the genes that belong to modules 1 and 22. All these genes, being implicated in at least one of the phenomena, namely cell survival, proliferation and metastasis, have an over-expression pattern similar to that of EGFR. In few modules, the genes such as CCNA2 (Cyclin A2), CCNB2 (Cyclin B2), CDK1, CDK5, CDC27, CDCA5, CDCA8, ASPM, BUB1, KIF15, KIF2C, NEK2, NUSAP1, PRC1, SMC4, SYCE2, TFDP1, CDC42 and ARHGEF9 are present that play a crucial role in cell cycle progression. In addition to the mentioned genes, there are some other genes (i.e. DLGAP5, BIRC5, PSMD2, Src, TTK, SENP2, PSMD2, DOK2, FUS and etc.) in the modules.

Genome-Scale Co-Expression Network Comparison across Escherichia coli and Salmonella enterica Serovar Typhimurium Reveals Significant Conservation at the Regulon Level of Local Regulators Despite Their Dissimilar Lifestyles

Zarrineh P, Sánchez-Rodríguez A, Hosseinkhan N, Narimani Z, Marchal K, Masoudi-Nejad A

PLoS One. 2014 Aug 7;9(8):e102871.

Full text Abstract

Availability of genome-wide gene expression datasets provides the opportunity to study gene expression across different organisms under a plethora of experimental conditions. In our previous work, we developed an algorithm called COMODO (COnserved MODules across Organisms) that identifies conserved expression modules between two species. In the present study, we expanded COMODO to detect the co-expression conservation across three organisms by adapting the statistics behind it. We applied COMODO to study expression conservation/divergence between Escherichia coli, Salmonella enterica, and Bacillus subtilis. We observed that some parts of the regulatory interaction networks were conserved between E. coli and S. enterica especially in the regulon of local regulators. However, such conservation was not observed between the regulatory interaction networks of B. subtilis and the two other species. We found co-expression conservation on a number of genes involved in quorum sensing, but almost no conservation for genes involved in pathogenicity across E. coli and S. enterica which could partially explain their different lifestyles. We concluded that despite their different lifestyles, no significant rewiring have occurred at the level of local regulons involved for instance, and notable conservation can be detected in signaling pathways and stress sensing in the phylogenetically close species S. enterica and E. coli. Moreover, conservation of local regulons seems to depend on the evolutionary time of divergence across species disappearing at larger distances as shown by the comparison with B. subtilis. Global regulons follow a different trend and show major rewiring even at the limited evolutionary distance that separates E. coli and S. enterica.

A New Genetic Algorithm for Multiple sequence Alignment

Narimani Z, Beigy H, Abolhassani H

International Journal of Computational Intelligence and Applications. 2012 Dec 14;11(04):1250023.

Full text Abstract

Multiple sequence alignment (MSA) is one of the basic and important problems in molecular biology. MSA can be used for different purposes including finding the conserved motifs and structurally important regions in protein sequences and determine evolutionary distance between sequences. Aligning several sequences cannot be done in polynomial time and therefore heuristic methods such as genetic algorithms can be used to find approximate solutions of MSA problems. Several algorithms based on genetic algorithms have been developed for this problem in recent years. Most of these algorithms use very complicated, problem specific and time consuming mutation operators. In this paper, we propose a new algorithm that uses a new way of population initialization and simple mutation and recombination operators. The strength of the proposed GA is using simple mutation operators and also a special recombination operator that does not have problems of similar recombination operators in other GAs. The experimental results show that the proposed algorithm is capable of finding good MSAs in contrast to existing methods, while it uses simple operators with low computational complexity.

Conference publications

Static Signature-based Malware Detection using Opcode and Binary Information

A Jalilian, Z Narimani, E Ansari

International Conference on Contemporary Issues on Data Science (CiDaS) 2019, Zanjan, Iran.

Full text
Reconstruction of CNV-related co-expression network in non small cell lung cancer (NSCLC)

G. Bidkhori, Z. Narimani, A. Masoudi-Nejad

4th Conference on Systems Biology of Mammalian Cells (SBMC) 2012, (presented as poster)

Full text
A new Genetic algorithms for Multiple Sequence Aglignment (in Persian)

Z Narimani, H Abolhassani, H Beigy

15th international CSI computer conference; Tehran, Iran, (2010)

Full text
A Survey on Clustering Methods

M Saraee, N Ahmadian, Z Narimani

First International Conference on Data Mining (IDMC), Tehran, Iran (2007)

Full text

Books

Next Generation Sequencing and Sequence Assembly: methodologies and algorithms

A Masoudi-Nejad, Z Narimani, N Hosseinkhan

Springer Brief (2013)

Full text
Next Generation Sequencing methods and data analysis (in Persian)

K Baghaei, N Hosseinkhan, Z Narimani

University of Shahid Beheshti Press, Tehran, Iran (2017)

Full text
Next Generation Sequencing methods and data analysis