Mathematical and Physical Sciences

Sayan Goswami

Assistant Professor of Computer Science

PhD (Louisiana State University)

+91.79.61911535

https://sayangoswami.github.io

Research Interests: Big data, distributed computing, high performance computing (HPC), computational genomics

Profile

Professor Sayan Goswami received his doctorate in computer science in 2019 from Louisiana State University in Baton Rouge, USA. He got his BTech in 2011 from the National Institute of Technology (NIT), Durgapur, followed by a stint with the IT industry in Sapient Global Markets where he worked on backend processes in energy trading. Before joining Ahmedabad University, he was an Assistant Professor of Computer science at LSU Shreveport, USA. He has taught courses ranging from introductory to advanced in programming, object-oriented programming, object-oriented design, rapid GUI app development, and big data. His interdisciplinary research interests lie in the application of High-Performance Computing to process scientific big data, especially those produced in computational genomics. In the past, Professor Goswami has primarily concentrated on de novo whole genome assembly from a computational point of view. Specifically, his research has addressed the increase in memory and execution times required to analyse the ever-growing amount of genomic data. In addition to parallel and distributed computational techniques, these solutions employ big data algorithms such as sketching and streaming. In his current research project, he is dealing with similar problems encountered in processing metagenomes but working on solutions involving hardware accelerators which yield more performance per rupee while requiring less space and energy footprint.

Professor Sayan Goswami is an Assistant Professor in the Mathematical and Physical Sciences division at the School of Arts and Sciences.

Research

In the past, Professor Goswami has primarily concentrated on de novo whole genome assembly from a computational point of view. Specifically, his research has addressed the increase in memory and execution times required to analyse the ever-growing amount of genomic data. A part of genomics deals with the extraction of whole genome sequences of organisms for applications in personalised medicine, evolutionary biology, etc. This requires machines known as sequencers which parse the nucleotides of the genome and output them as text. Due to limitations in sequencing technology, sequencers cannot read the entire genome at one go. Instead, they clone the genome and read short segments from quasi-random positions at each clone. These overlapping segments are then merged in a process known as assembly. From a computational point of view, an assembly is the shortest common superstring problem. Common heuristic solutions use either overlap graphs or de Bruijn graphs. Building overlap graphs include a compute-intensive step of finding overlaps between all pairs of reads in the dataset. Contrarily, de Bruijn graphs are easier to build but require large amounts to memory for storage and subsequent processing. Professor Goswami’s research addresses the increase in memory requirements and execution times of assemblers because of the genomic data explosion during the last decade.

Publications

Sayan Goswami, Kisung Lee, Seung-Jong Park. "Distributed de novo assembler for large-scale long-read datasets", IEEE International Conference on Big Data (Big Data), 2020.
Sayan Goswami, Ayam Pokhrel, Kisung Lee, Ling Liu, Qi Zhang, Yang Zhou. "GraphMap:scalable iterative graph processing using NoSQL."The Journal of Supercomputing, 2019.
Arghya Kusum Das, Sayan Goswami, Kisung Lee, Seung-Jong Park. "A hybrid and scalable error correction algorithm for indel and substitution errors of long reads", BMC Genomics 20(11), 2019.
Shayan Shams, Sayan Goswami, Kisung Lee. "Deep Learning-Based Spatial Analytics for Disaster- Related Tweets: An Experimental Study", Proceedings of the 20th IEEEInternational Conference on Mobile Data Management (MDM), 2019.
Sayan Goswami, Kisung Lee, Shayan Shams, and Seung-Jong Park. "GPU-Accelerated Large-Scale Genome Assembly", Proceedings of the 32nd IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2018.
Shayan Shams, Sayan Goswami, Kisung Lee, Seungwon Yang, and Seung-Jong Park. "Towards Distributed Cyberinfrastructure for Smart Cities using Big Data and Deep Learning Technologies", Proceedings of the 38th IEEE International Conference on Distributed Computing Systems (ICDCS), vision track paper, 2018.
Arghya Kusum Das, Jaeki Hong, Sayan Goswami, Richard Platania, Kisung Lee, Wooseok Chang, Seung-Jong Park, and Ling Liu. "Augmenting Amdahl’s Second Law: A Theoretical Model to Build Cost-Effective Balanced HPC Infrastructure for Data-Driven Science", Proceedings of the 10th IEEE International Conference on Cloud Computing (CLOUD), 2017.
Arghya Kusum Das, Shayan Shams, Sayan Goswami, Richard Platania, Kisung Lee, and Seung-Jong Park. "ParSECH: Parallel Sequencing Error Correction with Hadoop for Large-Scale Genome", Proceedings of the 9th International Conference on Bioinformatics and Computational Biology (BICOB), 2017.
Arghya Kusum Das, Praveen Kumar Koppa, Sayan Goswami, Richard Platania, and Seung-Jong Park. "Large-scale parallel genome assembler over cloud computing environment",Journal of bioinformatics and computational biology 15.03 (2017).
Sayan Goswami, Arghya Kusum Das, Richard Platania, Kisung Lee, and Seung-Jong Park. "Lazer: Distributed Memory-Efficient Assembly of Large-Scale Genomes", Proceedings of the IEEE International Conference on Big Data (IEEE BigData), 2016.
Praveen Kumar Koppa, Arghya Kusum Das, Sayan Goswami, Richard Platania, and Seung-Jong Park. "Giga: Giraph-based genome assembler for gigabase scale genomes." Proceedings of the 8th International Conference on Bioinformatics and Computational Biology (BICOB 2016). 2016.
Chui-hui Chiu, Nathan Lewis, Dipak Kumar Singh, Arghya Kusum Das, Mohammad M.Jalazai, Richard Platania, Sayan Goswami, Kisung Lee, and Seung-Jong Park. "Bic-lsu: Big data research integration with cyberinfrastructure for lsu", Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale, ACM, 2016.