“Clear technical direction for complex software systems.”
Software Engineering Consulting for Complex Systems
As an experienced software engineering consultant, working with startups, research teams, and software development groups, I evaluate and enhance the interpersonal and technical processes they follow to design, develop and maintain software.
I focus on approaches that leverage interpersonal best practices and technological solutions that efficiently yield software that meets an organization's short-term and long-term needs—integrating traditional software engineering methodologies and state-of-the-art ML-based AI techniques.
I also troubleshoot failing software projects.
Typical engagements include:
An inquiry into the client's organization and key players
A review of their software needs and architectures
An evaluation of their codebase and persistent data
A proposal for better achieving their needs
Case Study: Design and Build the Software Infrastructure for Wholecell Modeling
Role: Associate Professor in the Department of Genetics and Genomic Sciences
Research institution: Mount Sinai School of Medicine
Objective:
Engineer software that made progress towards the world's first computational models of the biochemistry in large bacteria or human cells. Such models could help us better understand how cells work and help advance medical research.
By analogy, weather forecasters use data about the weather — wind, temperature, etc. — and mathematical models that describe how upcoming weather will probably develop from current conditions. They implement the models as computer programs that weather organizations run frequently to create current weather forecasts.
In contrast, we worked on tools to forecast the behaviour of biological cells in response to their external conditions such as drug treatment
Challenge:
A bacterial cell contains several thousands of kinds of proteins, whereas a human cell may contain 100,000 or more different proteins. Numerous biochemical reactions transform molecules from one kind to another.
Requirements:
To build our computational models, we needed to obtain, store and validate detailed data about all of these molecules and reactions.
Software that made it easy for biomedical research scientists to read, understand, revise and save information about molecules and reactions did not exist, so we designed and built it.
Functionality and design:
We settled on a design that researchers found easy to use, enabled us to build software that automatically caught and reported almost all errors in the data, and ran quickly:
Data accessed by researchers was stored in spreadsheets, or column-delimited files that could be directly converted into spreadsheets.
When a spreadsheet was loaded into the software, each row corresponded to a single Python object.
Each column in the file corresponded to a field in the object. Metadata about a field, such as its datatype and whether it could be Null, was also stored in the spreadsheet.
The spreadsheets contained structured column and row headers, and a structured tab name for each worksheet, which defined the relationships between records and object names, and between column names and object attribute names.
In short, we invented a convenient, light-weight Object–relational mapping system that used spreadsheets and was more convenient and faster than database tables.
Results:
We call the tool ObjTables, as in "objects in spreadsheet tables". By providing a standardized way to represent and validate data, ObjTables promotes the reuse and integration of datasets across different projects and analyses.
ObjTables bridges the gap between human-readable spreadsheets and computational analysis in Python, particularly in scientific domains where structured data is crucial for modeling and simulation.
Software:
Publications:
Goldberg, A. P., & Karr, J. R. (2020). DE-Sim: an object-oriented, discrete-event simulation tool for data-intensive modeling of complex systems in Python. Journal of Open Source Software, 5(55), 2685.
Goldberg, A. P., Jefferson, D. R., Sekar, J. A., & Karr, J. R. (2020). Exact parallelization of the stochastic simulation algorithm for scalable simulation of large biochemical networks. arXiv preprint arXiv:2005.05295.
Porubsky, V. L., Goldberg, A. P., Rampadarath, A. K., Nickerson, D. P., Karr, J. R., & Sauro, H. M. (2020). Best practices for making reproducible biochemical models. Cell systems, 11(2), 109-120.
Karr, J., & Goldberg, A. (2020, June). An introduction to whole-cell modeling.
Karr, J. R., Liebermeister, W., Goldberg, A. P., Sekar, J. A., & Shaikh, B. (2020). ObjTables: structured spreadsheets that promote data quality, reuse, and integration. arXiv preprint arXiv:2005.05227.
Szigeti, B., Roth, Y. D., Sekar, J. A., Goldberg, A. P., Pochiraju, S. C., & Karr, J. R. (2018). A blueprint for human whole-cell modeling. Current opinion in systems biology, 7, 8-15.