**Research Interests:** Design And Analysis Of Algorithms For Problems With Applications In Bioinformatics/Computational Biology, Data Sciences, Distributed Systems And Operations Research.

Profile

Srikrishnan Divakaran completed his PhD in Computer Science in 2002 from Rutgers University, New Brunswick, USA. Then, from 2002 to 2008 he worked as an Assistant Professor in the Computer Science department at Hofstra University, Long Island, NY, as Associate Professor at DAIICT from 2009 to 2016 before joining School of Engineering and Applied Sciences at Ahmedabad University in 2017. Dr Divakaran has nearly 20 years of research and over 15 years of teaching experience and over five years of industry experience at leading multi-national companies in computing and finance.

Dr Divakaran has taught a wide range of courses in Computer Science as well as related disciplines like Bioinformatics and Operations Research and has a strong research background in designing algorithms for problems with applications in bioinformatics/computational biology, distributed systems and operations research. In terms of research, over the past 7 years, his interests have broadly been in the area of design and analysis of online and approximation algorithms for problems in Bioinformatics/Computational Biology, Distributed Systems and Operations Research. In Bioinformatics, his current research focus is on the design and analysis of approximation algorithms and heuristics for the following problems: (1) Constrained Generalized Tree Alignment, (2) Template-Based Methods for Sequence Alignment and (3) Fast Heuristics for Exact String Matching. In Distributed Systems, his research focus is in the design and analysis of online and offline approximation algorithms for problems in resource allocation, load balancing and list update. In Operations Research, his research interests have been in the design and analysis of online and approximation algorithms for bin packing and scheduling with set-ups.

Research

**Research Summary **

My research interests are broadly in the area of design and analysis of online and approximation algorithms for problems in Bioinformatics/Computational Biology, Machine Scheduling and Online Computation.

**Bioinformatics Research**

Constrained Generalized Tree Alignment: For a given set S of related biological sequences, the generalized tree alignment problem is the problem of constructing an evolutionary tree for S of minimum cost, where the cost of the tree is the sum of its edge costs and the cost of an edge represents either the mutational distance or the similarity of the biological sequences associated with the ends of the edge. This problem involves simultaneously constructing a phylogenetic tree and a minimum cost evolutionary tree for S. This problem is known to be MAX-SNP Hard and is one of the widely studied problems in Computational Biology.

In many instances of the generalized tree alignment problem, a partial phylogenetic tree is known either based on known biological relationships between sequences or based on clustering information. In these situations, there are no incentives for algorithms to exploit the knowledge of partial phylogenetic tree to construct evolutionary trees. However, if constraints are placed on the evolutionary tree topology, then there is an incentive for algorithms to use the partial phylogenetic tree to restrict the number of tree topologies it considers and construct biologically relevant minimum cost evolutionary tree for S. So, we proposed the following natural generalization of the generalized tree alignment problem:

Given a set of k related biological sequences and a phylogenetic forest comprising of node disjoint phylogenetic trees that specifies the topological constraints that any evolutionary tree needs to satisfy, construct a minimum cost evolutionary tree for S.

Template Based Methods for Sequence Alignment: Advances in biotechnology have made available massive amounts of functional, structural and genomic data for many biological sequences. This increased availability of heterogeneous biological data has resulted in biological applications where a multiple sequence alignment (MSA) is required for aligning similar features, where a feature is described in structural, functional or evolutionary terms. In these applications, for a given set of sequences, depending on the feature of interest the optimal MSA is likely to be different, and sequence similarity can only be used as a rough initial estimate on the accuracy of an MSA. This has motivated the growth in template-based heuristics that supplement the sequence information with evolutionary, structural and functional data and exploit feature similarity instead of sequence similarity to construct multiple sequence alignments that are biologically more accurate. However, current frameworks for designing template-based heuristics do not allow the user to explicitly specify the information that can help to classify features into types and associate weights signifying the relative importance of a feature with respect to other features, even though in many instances this additional information is readily available. This has resulted in the use of ad hoc measures and algorithms to define feature similarity and MSA construction respectively.

Fast Heuristics for Exact String Matching: The online exact string matching problem consists of finding all occurrences of a given pattern P of length n in a text T of length m. It is a problem in computer science that has been studied widely since 1970 and has applications in bioinformatics, computational biology, image processing, data compression and many other areas.

Our Results: For the constrained generalized tree alignment problem, I have designed constant approximation algorithms. These algorithms are computationally very fast and for the special case of generalized tree, alignment provide a guaranteed error bound of 2-2/k.

For the template-based alignment methods, my research attempts to provide a mechanism whereas a part of the template information the user can explicitly specify for each feature, its type, and weight. The type is to classify the features into different categories based on their characteristics and the weight signifies the relative importance of a feature with respect to other features in that sequence. Second, we exploit the above information to define scoring models for pair-wise sequence alignment that assume segment conservation as opposed to a single character (residue) conservation. Finally, we present a fast progressive alignment-based heuristic framework that helps in constructing a global MSA by first constructing an MSA involving only the informative segments using exact methods, and then stitch into this the alignment of non-informative segments constructed using fast approximate methods.

For the exact string matching problem, given a pattern string P of length n and a query string T of length m, where the characters of P and T are drawn from an alphabet of size Δ, the exact string matching problem consists of finding all occurrences of P in T. For this problem, we present algorithms that in O(nΔ2) time preprocess P to essentially identify sparsely (P), a rarely occurring substring of P, and then use it to find occurrences of P in T efficiently. Our algorithms require a worst-case search time of O(m), and expected search time of O(m/min(|sparse(P)|, Δ)), where |sparse(P)| is at least δ (i.e. the number of distinct characters in P), and for most pattern strings it is observed to be Ω(n1/2).

**Machine Scheduling Research**

Our Results: This work was done in collaboration with Prof. Michael Saks mostly as a part of my doctoral dissertation. We have been able to

Obtain new approximation results for the offline problem of scheduling a sequence of jobs on a single machine with sequence-independent set-up times for the weighted completion time and maximum lateness criteria. Our results were the first known polynomial-time constant approximation algorithms for this problem.

Design an O(1)-competitive online algorithm for a problem of scheduling a sequence of jobs on a single machine system with sequence-independent set-ups and arbitrary job release times with the objective of minimizing the maximum flow time. We also present proof of NP-completeness for the offline problem.

**Online Computation Research**

Given an ordered list ρ (an ordering among the items in L) and a request sequence σ, we need to determine how to reorganize L while serving σ so as to minimize the total servicing (access and list reorganization) cost.

This is a very hard problem that has been widely studied by theoretical computer scientists since the 1980’s and has widespread applications in distributed systems in the context of designing paging and resource allocation algorithms.

Our Results: For the static list update problem, given an ordered list ρ (an ordering of the list L = { a1, a2, ..., al }), and a sequence _ = (σ1, σ2, ..., σm) of requests for items in L, we characterize the list reorganizations in an optimal offline solution in terms of an initial permutation of the list followed by a sequence of m element transfers, where an element transfer is a type of list reorganization where only the requested item can be moved. Then we make use of this characterization to design an O(l2(l − 1)!m) time-optimal offline algorithm.

**Future Plans**

(a) Develop tools for pattern matching, and (b) develop frameworks for the design and analysis of alignment of biological sequences constrained by the biological context and using known biological information specified by the user using templates.

From the perspective of design and analysis of algorithms for problems in Bioinformatics/Computational Biology,my focus will be on designing fast and efficient approximation algorithms and heuristics for fundamental problems in Bioinformatics/Computational Biology like (i) Multiple Sequence Alignment Problem; (ii) Constrained Phylogenetic Tree Alignment Problem, and (iii) Exact/Approximate/Constrained Pattern Matching.

In theoretical computer science, my focus is on designing approximation problems for some central problems in Bin packing and Resource allocation. I am primarily looking at developing an alternate cost model that provides a good approximation of existing cost models and holds promise for designing good polynomial-time approximation schemes.

Publication

**Articles in Books and Journals**

- S. Divakaran and M. Saks, An Online Algorithm for a Problem in Scheduling with Set-ups and Release Times, Algorithmica 60(2): 301-315, 2011.
- S. Divakaran, Approximation algorithms for constrained generalized tree alignment problem, Discrete Applied Mathematics 157(7): 1407-1422, 2009.
- S. Divakaran and M. Saks, Approximation Algorithms for problems in Scheduling with set-ups, Discrete Applied Mathematics, Volume 156(5), 719-729, 2008.
- S. Divakaran, Approximation Algorithms for problems in Scheduling with Set-ups, Graduate Dissertations, Rutgers, The State University of New Jersey, 2001, ISBN: 0-493-47085-9.
- S. Divakaran and M. Saks, An Online Scheduling Problem with Job Set-ups, Center for Discrete Mathematics & Theoretical Computer Science, 2000-34.

- S.Divakaran, Fast Algorithms for Exact String Matching. (Proceedings of 2016 Winter School of Combinatorics, Charles University) CoRR abs/1509.09228), 2016.
- S. Divakaran, An Approximation Algorithm for a Multi-dimensional Resource Allocation Problem, Proceedings of the 13th International Conference on Combinatorial Optimization, CO2004, Lancaster, UK, pp 15, March 28-31, 2004.
- S. Divakaran and S. Divakaran, A framework for Request Scheduling in Web Servers, Special Focus on Next Generation Networks Technologies and Applications, DIMACS, Piscataway, New Jersey, December, 2002.
- S. Divakaran and M. Saks, Online Scheduling with release time and set-ups, Proceedings of the 12th International Conference on Combinatorial Optimization, CO2002, CNAM Paris, pp 43, April 8-10, 2002.
- S. Divakaran and M. Saks, Approximation Algorithms for Offline Scheduling with set-ups, Proceedings of the 12th International Conference on Combinatorial Optimization, CO2002, CNAM Paris, pp 44, April 8-10, 2002.
- M. Rangarajan, S. Divakaran, T. Nguyen and L. Iftode, Multi-threaded HLRC DSM, 1st Workshop on Software Distributed Shared Memory, Rhodes, Greece, June 25, 1999.

- S.Divakaran, A Fast Heuristic for Exact String Matching. CoRR abs/1512.03512 (2015)
- S.Divakaran, Fast Algorithms for Exact String Matching. CoRR abs/1509.09228 (2015)
- S. Divakaran, An Optimal Offline Algorithm for List Update. CoRR abs/1404.7638 (2014)
- S. Divakaran, A Fast Template Based Heuristic For Global Multiple Sequence Alignment. CoRR abs/1302.6030 (2013)
- S. Divakaran, Approximation Algorithms for Constrained Generalized Tree Alignment Problem, DIMACS Technical Report 2007-21.
- New Heuristics for Multiple Sequence Alignments, Technical Report, Hofstra University, 2006.
- S. Divakaran, Approximation Algorithms for a Multi-dimensional Resource Allocation Problem, DIMACS Technical Report 2006-24.
- S. Divakaran and M. Saks, Online Scheduling with release time and set-ups, DIMACS Technical Report 2001-50.
- S. Divakaran and M. Saks, Approximation Algorithms for Offline Scheduling with set-ups, DIMACS Technical Report 2001-49.

Teaching

Winter 2017: Math 502W: Advanced Statistics

Mon: 2:3:30pm (SEAS 112) Tue 10:00-11:30am (SEAS 211)