Analytics Research

At the intersection of computer science, statistics, business, and society.

Selected Recent Analytics Publications:

Yang, Y., Lalor, J., Abbasi, A., & Zeng, D. (2025). Hierarchical deep document model. IEEE Transactions on Knowledge & Data Engineering, 37(1), 351–364.

Shi, P., & Zhao, Z. (2024). Enhanced pricing and management of bundled insurance risks with dependence-aware prediction using pair copula construction. Journal of Econometrics, 240 (1).

Grisold, T., Seidel, S., Heck, M., & Berente, N. (2024). Digital surveillance in organizations. Business & Information Systems Engineering.

Xu, H., Wang, D., Zhao, Z., & Yu, Y. (2024). Change-point inference in high-dimensional regression models under temporal dependence. The Annals of Statistics, 52(3), 999–1026.

Zhao, Z., Ma, T. F., Ng, W. L., & Yau, C. Y. (2024). A composite likelihood-based approach for change-point detection in spatio-temporal processes. Journal of the American Statistical Association, 119(548), 3086–3100.

Anderson, S. F. & Kelley, K. (2024). Sample size planning for replication studies: The devil is in the design. Psychological Methods

Cai, J., Gu, X., Zhao, L., Zhu, W. (2023). “State Ownership in China: An Equity Network Perspective.” The Arc of Chinese Economy. Brookings Institution Press.

Somanchi, S., Abbasi, A., Kelley, K., Dobolyi, D., and Yuan, T.T. (2023) Examining User Heterogeneity in Digital Experiments. ACM Transactions on Information Systems (TOIS), 41(4), pp.1-34.

Yang, K., Lau, R.Y.K., and Abbasi, A. (2023) Getting Personal: A Deep Learning Artifact for Text-based Measurement of Personality, Information Systems Research.

Youyou, W., Yang, Y., & Uzzi,B. (2023) A discipline-wide investigation of replicability in Psychology over the past 20 years. Proceedings of the National Academy of Sciences, 120.

Jakubowski, B., Somanchi, S., McFowland III, E., and Neill, D.B. (2023) Exploiting Discovered Regression Discontinuities to Debias Conditioned-on-Observable Estimators. Journal of Machine Learning Research (JMLR).

Lalor, J., & Rodriguez, P. (2022). Py-IRT: A scalable item response theory library for Python. INFORMS Journal on Computing, 35(1), 5-13.

Lalor, J. P., Yang, Y., Smith, K., Forsgren, N., and Abbasi, A. (2022) “Benchmarking intersectional biases in NLP.” In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 3598-3609.

Padilla, C., Wang, D., Zhao, Z., & Yu, Y. (2022) Change-point Detection for Sparse and Dense Functional Data in General Dimensions. Advances in Neural Information Processing Systems.

Guo, Y., Yang, Y., and Abbasi, A. (2022) “Auto-Debias: Debiasing Masked Language Models with Automated Biased Prompts,” In the 60^th Annual Meeting of the Association for Computational Linguistics (ACL).

Ogburn, E., Cai, J., Kuchibhotla, A., & Berk, R. (2022). Practical issues concerning assumption-lean inference for generalized linear models. Journal of the Royal Statistical Society: Series B, Statistical Methodology.

Yang, Y., Tian, T.Y., Woodruff, T.K., Jones, B.F., and Uzzi, B. (2022) Gender-diverse Teams Produce More Novel and Higher Impact Scientific Ideas. Proceedings of the National Academy of Sciences.

Wang, D., Zhao, Z., Yu, Y., Willett, R. (2022) Functional linear regression with mixed predictors, Journal of Machine Learning Research.

Wowak, K. D., Handley, S. M., Kelley, K., & Angst, C. M. (2022) Strategic sourcing of multi-component software systems: The case of electronic medical records. Decision Sciences Journal.

Zhao, Z., Jiang, F., and Shao, X. (2022). Segmenting Time Series via Self-Normalisation. Journal of the Royal Statistical Society – Series B, 84(5), 1699–1725

Zhao, Z., Shi, P., & Zhang, Z. (2021). Modeling multivariate time series with copula-linked univariate D-vines. Journal of Business & Economic Statistics, 40(2), 690–704.

Abbasi, A., Dobolyi, D., Vance, A., & Zahedi, F. M. (2021). The phishing funnel model: A design artifact to predict user susceptibility to phishing websites. Information Systems Research, 32(2), 410-436.

Abbasi, A., Dobolyi, D., Lalor, J.P., Netemeyer, R., Smith, K., and Yang, Y. (2021) Constructing a Psychometric Testbed for Fair Natural Language Processing. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.

Rodriguez, P., Barrow, J., Hoyle, A.M., Lalor, J.P., Jia, R. and Boyd-Graber, J., (2021) Evaluation Examples Are Not Equally Informative: How Should That Change NLP Leaderboards?. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) .

He, Y., Peng, L., Zhang, D., & Zhao, Z. (2021). Risk analysis via generalized Pareto distributions. Journal of Business & Economic Statistics, 40(2), 852–867.

Wang,D., Zhao, Z., Lin, K., Willett, R. (2021) Statistically and computationally efficient change point localization in regression settings, Journal of Machine Learning Research.

Zhao, Z. Shi, P. & Feng, X. (2021) Knowledge Learning of Insurance Risks Using Dependence Models. Journal on Computing.

Jiang,F., Zhao, Z., Shao, X. (2021) Modelling the COVID-19 infection trajectory: A piecewise linear quantile trend model, Journal of the Royal Statistical Society – Series B, with discussion.

Ahmad, F., Abbasi, A., Kitchens, B., Adjeroh, D. A., & Zeng, D. (2020). Deep Learning for Adverse Event Detection from Web Search. IEEE Transactions on Knowledge and Data Engineering.

Ahmad, F., Abbasi, A., Li, J., Dobolyi, D. G., Netemeyer, R. G., Clifford, G. D., & Chen, H. (2020). A Deep Learning Architecture for Psychometric Natural Language Processing. ACM Transactions on Information Systems (TOIS), 38(1), 1-29.

Jiang, F., Zhao, Z., & Shao, X. (2020). Time series analysis of COVID-19 infection curve: A change-point perspective. Journal of econometrics.

Kuchibhotla, A. K., Brown, L. D., Buja, A., Cai, J., George, E. I., Zhao, L. (2020). “Valid Post-selection Inference in Model-free Linear Regression.” Annals of Statistics, 48(5), 2953-2981.

Lalor, J.P., Yu H. (2020). Dynamic Data Selection for Curriculum Learning via Ability Estimation. Conference on Empirical Methods in Natural Language Processing.

Shi, P., & Zhao, Z. (2020). Regression for copula-linked compound distributions with applications in modeling aggregate insurance claims. Annals of Applied Statistics, 14(1), 357-380.

Tofighi, D., & Kelley, K. (2020). Improved inference in mediation analysis: Introducing the model-based constrained optimization procedure. Psychological Methods.

Tofighi, D., & Kelley, K. (2020). Indirect effects in sequential mediation models: Evaluating methods for hypothesis testing and confidence interval formation. Multivariate Behavioral Research, 55, 188–210.

Traeger, M. L., Sebo, S. S., Jung, M., Scassellati, B., & Christakis, N. A. (2020). Vulnerable robots positively shape human conversational dynamics in a human–robot team. Proceedings of the National Academy of Sciences, 117(12), 6370-6375.

Yang, Y., Wu, Y., and Uzzi, B. (2020). Estimating the ‘Deep-Replicability’ of Scientific Findings Using Human and Machine Intelligence. Proceedings of the National Academy of Sciences, 117 (20) 10762-10768.

Yang, Y., Pah, A., & Uzzi, B. (2019) Quantifying the Future Lethality of Terror Organizations. Proceedings of the National Academy of Sciences, 116, 2019

Yang, Y., Chawla, N., Uzzi, B. (2019) A network’s gender composition and communication pattern predict women’s leadership success. Proceedings of the National Academy of Sciences, 116.

Lalor, J.P., Wu H., Yu H. (2019). Learning Latent Parameters without Human Response Patterns: Item Response Theory with Artificial Crowds. Conference on Empirical Methods in Natural Language Processing.

McNeish, D., & Kelley, K. (2019). Fixed effects models versus mixed effects models for clustered data: Reviewing the approaches, disentangling the differences, and making recommendations. Psychological Methods, 24(1), 20.

Kelley, K., Bilson Darku, F., & Chattopadhyay, B. (2019). Sequential accuracy in parameter estimation for population correlation coefficients. Psychological methods, 24(4), 492.

Cai, J., Mandelbaum, A., Nagaraja, C. H., Shen, H., Zhao, L. (2019). “Statistical Theory Powering Data Science.” Statistical Science, 34(4), 669-691.

Analytics Research Faculty: