My research interest includes data and information management, data analytics, online communities, smart transportation, and Computer Science Education. Over the years, my research looked at ways to support different stakeholders of information systems, including curators of digital objects, educators, developers, researchers, and policy-makers. My works ranged from curation, indexing, and integration of data to using data mining approaches for gaining a better understanding of disparate data. Incorporating online communities into digital libraries and identifying factors that can impact community participation was another foci of my research.
Current Research Threads
Data management and integration: One of my current research projects at the University of Texas at El Paso (UTEP) is investigating data integration approaches to support initiatives linked to student success. Our team is working on identifying data sources, modeling the attributes and relationships of those sources, and building secure connections to communicate with the sources to facilitate information processing. This effort builds partly on my earlier work with information systems that indexes objects from different sources and resulted in identifying effective modeling, representation, and recommendation techniques for digital libraries. We collaborated with Virginia Tech’s Digital Library Research Lab (DLRL) and our effort is supported by an NSF supplementary grant.
Knowledge propagation: Data integration efforts aim to bridge the gap between data captured by different initiatives. Similarly, interdisciplinary research integrates data, concept, and approach from various disciplines. Part of my current research focuses on using data analytics to identify the research concepts from one domain that can align with the concepts from other domains to facilitate interdisciplinary collaboration.
Smart transportation: I am collaborating with a group of researchers to analyze open datasets from smart cities to detect transportation status, especially mobility trends. We are also utilizing citizen data sources along with open data to investigate noise through the concept of geo-fencing.
Computer Science Education: Currently, I am collaborating with a team of undergraduate and graduate students who are developing an educational game to encourage middle school students in exploring STEM areas. Our efforts have been recognized by the Emerging Researchers Network in STEM (ERN) conference in 2017 as we secured by first position for Undergraduate Poster Presentation in Computer Science and also won the first prize for the Video Contest Award in the same conference. More on this project.
Some of the Past Projects
MetaShare: A data management system that builds a data integration framework to facilitate the retrieval, integration, and discovery of student success data collected by initiatives across campus. The goal is to define a framework institutions can adopt to integrate relevant information about disparate student success initiatives, and use that to assess the impact of those initiatives.
AlgoViz: This is a portal that provides a community oriented virtual space for educators who use algorithm visualizations. AlgoViz portal was officially launched at the beginning of Fall 2009. Currently, we are looking at how we can create an active online community. To increase user participation, we are offering various services as a way to lower the participation barrier for a new user. We are also collecting traffic data and analyzing it to find various trends that might be helpful in assisting existing users. AlgoViz provides metadata of its' catalog entries following the OAI-PMH protocol.
Ensemble: A distributed portal for CS education. This is the other project I am involved in. At Ensemble, we collect computing education material, provide a place for communities to collaborate and share education material, and host various education tools.
InfoViz: This is a course project for analyzing the cellular signaling pathways (STKE). Signaling pathways are relations between proteins that transform cellular signals to appropriate biological responses. Our observation is that a relation between two components of a signal can appear in more than one pathway that might aid the biologists to identify a new phenomenon. We developed a tool that provides a visualization of the performance of clustering the pathways as well as linking two different pathways that might not have anything in common. More about this project.
Personal Information Management: Organizing files could be a daunting task for many. Organization often requires going through the contents of the files and making sure that a directory contains files with similar theme. When categories overlap or are closely related to each other, it becomes even more cumbersome to decide where to finally place the file. Remembering the placement of such confusing files are also tricky. To ease the process of organizing and re-finding of a material we developed a prototype called Content based Intelligent Organizer. This tool looks at the content of a file and uses this information to suggest some relevant labels that already exist in the tool or are new. The tool learns from the user behavior and uses the knowledge as a guidance for the next set of label suggestions.
Graph based document mining: My Masters thesis was on using an Association Rule mining algorithm for text document. We created document graphs using WordNet. Then used FP-Growth algorithm to find frequent subgraphs and clustered the documents using those graphs. Because we used WordNet, the frequent subgraphs actually represented frequent senses appearing in documents. Mining the FP-tree for normal transaction database, for which FP-Growth as created; is easier compared to large document-graphs as the items of a traditional transaction are individual and have no direct connection among them. In contrary, as we look for subgraphs within graphs they become related to each other in the context of subgraph similarity. The computation cost makes the original FP-tree mining approach somewhat inefficient for text documents. We modified FP-growth thus making it possible to generate subgraphs from the FP-tree for text documents.