Amit Sheth

Amit Sheth

Lexington County, South Carolina, United States
6K followers 500+ connections

About

Educator, Researcher, and Entrepreneur.

Prof. Sheth is working towards a vision…

Articles by Amit

Contributions

Activity

Join now to see all activity

Experience

  • University of South Carolina Graphic

    University of South Carolina

    Columbia, South Carolina, United States

  • -

    Columbia, South Carolina Area

  • -

  • -

  • -

  • -

    Dayton, Ohio Area

  • -

  • -

    http://www.edamam.com/

  • -

  • -

    Louisville, Kentucky Area

  • -

    Dayton, Ohio Area

  • -

    Dayton, Ohio Area

  • -

  • -

  • -

  • -

    Athens, Georgia Area

  • -

    Piscataway, New Jersey, United States

  • -

  • -

    Pilani, Rajasthan, India

Education

  • The Ohio State University Graphic

    The Ohio State University

    -

    Activities and Societies: Chair, Graduate Students Council

    MS and PhD in Computer & Information Science. Chair, Graduate Students Committee.

  • -

  • -

  • -

Publications

  • "Time for dabs": Analyzing Twitter data on marijuana concentrates across the U.S.

    Drug Alcohol Dependency

    Abstract
    AIMS:
    Media reports suggest increasing popularity of marijuana concentrates ("dabs"; "earwax"; "budder"; "shatter; "butane hash oil") that are typically vaporized and inhaled via a bong, vaporizer or electronic cigarette. However, data on the epidemiology of marijuana concentrate use remain limited. This study aims to explore Twitter data on marijuana concentrate use in the U.S. and identify differences across regions of the country with varying cannabis legalization…

    Abstract
    AIMS:
    Media reports suggest increasing popularity of marijuana concentrates ("dabs"; "earwax"; "budder"; "shatter; "butane hash oil") that are typically vaporized and inhaled via a bong, vaporizer or electronic cigarette. However, data on the epidemiology of marijuana concentrate use remain limited. This study aims to explore Twitter data on marijuana concentrate use in the U.S. and identify differences across regions of the country with varying cannabis legalization policies.
    METHODS:
    Tweets were collected between October 20 and December 20, 2014, using Twitter's streaming API. Twitter data filtering framework was available through the eDrugTrends platform. Raw and adjusted percentages of dabs-related tweets per state were calculated. A permutation test was used to examine differences in the adjusted percentages of dabs-related tweets among U.S. states with different cannabis legalization policies.
    RESULTS:
    eDrugTrends collected a total of 125,255 tweets. Almost 22% (n=27,018) of these tweets contained identifiable state-level geolocation information. Dabs-related tweet volume for each state was adjusted using a general sample of tweets to account for different levels of overall tweeting activity for each state. Adjusted percentages of dabs-related tweets were highest in states that allowed recreational and/or medicinal cannabis use and lowest in states that have not passed medical cannabis use laws. The differences were statistically significant.
    CONCLUSIONS:
    Twitter data suggest greater popularity of dabs in the states that legalized recreational and/or medical use of cannabis. The study provides new information on the epidemiology of marijuana concentrate use and contributes to the emerging field of social media analysis for drug abuse research.

    KEYWORDS:
    Cannabis; Marijuana concentrates; Marijuana legalization; Social media; Twitter
    PMID: 26338481 [PubMed - in process] PMCID: PMC4581982 [Avail

    Other authors
    See publication
  • Implicit Entity Recognition in Clinical Documents

    Fourth Joint Conference on Lexical and Computational Semantics (*SEM)

    With the increasing automation of health care information processing, it has become crucial to extract meaningful information from textual notes in electronic medical records. One of the key challenges is to extract and normalize entity mentions. State-of-the-art approaches have focused on the recognition of entities that are explicitly mentioned in a sentence. However, clinical documents often contain phrases that indicate the entities but do not contain their names. We term those implicit…

    With the increasing automation of health care information processing, it has become crucial to extract meaningful information from textual notes in electronic medical records. One of the key challenges is to extract and normalize entity mentions. State-of-the-art approaches have focused on the recognition of entities that are explicitly mentioned in a sentence. However, clinical documents often contain phrases that indicate the entities but do not contain their names. We term those implicit entity mentions and introduce the problem of implicit entity recognition (IER) in clinical documents. We propose a solution to IER that leverages entity definitions from a knowledge base to create entity models, projects sentences to the entity models and identifies implicit entity mentions by evaluating semantic similarity between sentences and entity models. The evaluation with 857 sentences selected for 8 different entities shows that our algorithm outperforms the most closely related unsupervised solution. The similarity value calculated by our algorithm proved to be an effective feature in a supervised learning setting, helping it to improve over the baselines, and achieving F1 scores of .81 and .73 for different classes of implicit mentions. Our gold standard annotations are made available to encourage further research in the area of IER.

    Other authors
    See publication
  • Gender-Based Violence in 140 Characters or Fewer: A #BigData Case Study of Twitter

    Kno.e.sis Tech. Report 2015

    Humanitarian and public institutions are increasingly relying on data from social media sites to measure public attitude, and provide timely public engagement. Such engagement supports the exploration of public views on important social issues such as gender-based violence (GBV). In this study, we examine Big (Social) Data consisting of nearly fourteen million tweets collected from the Twitter platform over a period of ten months to analyze public opinion regarding GBV, highlighting the nature…

    Humanitarian and public institutions are increasingly relying on data from social media sites to measure public attitude, and provide timely public engagement. Such engagement supports the exploration of public views on important social issues such as gender-based violence (GBV). In this study, we examine Big (Social) Data consisting of nearly fourteen million tweets collected from the Twitter platform over a period of ten months to analyze public opinion regarding GBV, highlighting the nature of tweeting practices by geographical location and gender. The exploitation of Big Data requires the techniques of Computational Social Science to mine insight from the corpus while accounting for the influence of both transient events and sociocultural factors. We reveal public awareness regarding GBV tolerance and suggest opportunities for intervention and the measurement of intervention effectiveness assisting both governmental and non-governmental organizations in policy development.

    Other authors
    See publication
  • FACES: Diversity-Aware Entity Summarization using Incremental Hierarchical Conceptual Clustering

    AAAI

    Semantic Web documents that encode facts about entities on the Web have been growing rapidly in size and evolving over time. Creating summaries on lengthy Semantic Web documents for quick identification of the corresponding entity has been of great contemporary interest. In this paper, we explore automatic summarization techniques that characterize and enable identification of an entity and create summaries that are human friendly. Specifically, we highlight the importance of diversified…

    Semantic Web documents that encode facts about entities on the Web have been growing rapidly in size and evolving over time. Creating summaries on lengthy Semantic Web documents for quick identification of the corresponding entity has been of great contemporary interest. In this paper, we explore automatic summarization techniques that characterize and enable identification of an entity and create summaries that are human friendly. Specifically, we highlight the importance of diversified (faceted) summaries by combining three dimensions: diversity, uniqueness, and popularity. Our novel diversity-aware entity summarization approach mimics human conceptual clustering techniques to group facts, and picks representative facts from each group to form concise (i.e., short) and comprehensive (i.e., improved coverage through diversity) summaries.
    We evaluate our approach against the state-of-the-art techniques and show that our work improves both the quality and the efficiency of entity summarization.

    Citation:
    Kalpa Gunaratna, Krishnaprasad Thirunarayan, and Amit Sheth. 'FACES: Diversity-Aware Entity Summarization using Incremental Hierarchical Conceptual Clustering'. 29th AAAI Conference on Artificial Intelligence (AAAI 2015), AAAI, 2015.
    http://knoesis.org/library/resource.php?id=2023

    Other authors
    See publication
  • Context-Driven Automatic Subgraph Creation for Literature-Based Discovery

    Journal of Biomedical Informatics

    Background: Literature-based discovery (LBD) is characterized by uncovering hidden associations in non-interacting scientific literature. Prior approaches to LBD include use of: 1) domain expertise and structured background knowledge to manually filter and explore the literature, 2) distributional statistics and graph-theoretic measures to rank interesting connections, and 3) heuristics to help eliminate spurious connections. However, manual approaches to LBD are not scalable and purely…

    Background: Literature-based discovery (LBD) is characterized by uncovering hidden associations in non-interacting scientific literature. Prior approaches to LBD include use of: 1) domain expertise and structured background knowledge to manually filter and explore the literature, 2) distributional statistics and graph-theoretic measures to rank interesting connections, and 3) heuristics to help eliminate spurious connections. However, manual approaches to LBD are not scalable and purely distributional approaches may not be sufficient to obtain insights into the meaning of poorly understood associations. While several graph-based approaches have the potential to elucidate associations, their effectiveness has not been fully demonstrated. A considerable degree of a priori knowledge, heuristics, and manual filtering is still required. Objectives: In this paper we implement and evaluate a context-driven, automatic subgraph creation method that captures multifaceted complex associations between biomedical concepts to facilitate LBD. Given a pair of concepts, our method automatically generates a ranked list of subgraphs, which provide informative and potentially unknown associations between such concepts.

    Keywords:
    Literature-based discovery (LBD); Graph mining; Path clustering; Hierarchical agglomerative clustering; Semantic relatedness; Medical Subject Headings (MeSH)

    Journal of Biomedical Informatics
    Volume 54, April 2015, Pages 141–157

    Other authors
    See publication
  • Gender-Based Violence in 140 Characters or Fewer: A #BigData Case Study of Twitter

    Kno.e.sis Tech. Report 2015

    Humanitarian and public institutions are increasingly relying on data from social media sites to measure public attitude, and provide timely public engagement. Such engagement supports the exploration of public views on important social issues such as gender-based violence (GBV). In this study, we examine Big (Social) Data consisting of nearly fourteen million tweets collected from the Twitter platform over a period of ten months to analyze public opinion regarding GBV, highlighting the nature…

    Humanitarian and public institutions are increasingly relying on data from social media sites to measure public attitude, and provide timely public engagement. Such engagement supports the exploration of public views on important social issues such as gender-based violence (GBV). In this study, we examine Big (Social) Data consisting of nearly fourteen million tweets collected from the Twitter platform over a period of ten months to analyze public opinion regarding GBV, highlighting the nature of tweeting practices by geographical location and gender. The exploitation of Big Data requires the techniques of Computational Social Science to mine insight from the corpus while accounting for the influence of both transient events and sociocultural factors. We reveal public awareness regarding GBV tolerance and suggest opportunities for intervention and the measurement of intervention effectiveness assisting both governmental and non-governmental organizations in policy development.

    Other authors
    See publication
  • Comparative Analysis of Online Health Queries Originating from Personal Computers and Smart Devices on a Consumer Health Information Portal

    Journal of Medical Internet Research (Impact factor 3.8)

    The number of people using the Internet and usage of smart devices for health information seeking are increasing rapidly. In this study, we analyzed how device choice (desktops/laptops vs smartphones/tablets) impacts the online health information seeking.

    Other authors
  • Active Learning with Efficient Feature Weighting Methods for Improving Data Quality and Classification Accuracy

    In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL'14)

    Many machine learning datasets are noisy with a substantial number of mislabeled instances. This noise yields sub-optimal classification performance. In this paper we study a large, low quality annotated dataset, created quickly and cheaply using Amazon Mechanical Turk to crowdsource annotations. We describe computationally cheap feature weighting techniques and a novel non-linear distribution spreading algorithm that can be used to iteratively and interactively correcting mislabeled instances…

    Many machine learning datasets are noisy with a substantial number of mislabeled instances. This noise yields sub-optimal classification performance. In this paper we study a large, low quality annotated dataset, created quickly and cheaply using Amazon Mechanical Turk to crowdsource annotations. We describe computationally cheap feature weighting techniques and a novel non-linear distribution spreading algorithm that can be used to iteratively and interactively correcting mislabeled instances to significantly improve annotation quality at low cost. Eight different emotion extraction experiments on Twitter data demonstrate that our approach is just as effective as more computationally expensive techniques. Our techniques save a considerable amount of time.

    Other authors
    See publication
  • Alignment and dataset identification of linked data in Semantic Web

    Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery

    The Linked Open Data (LOD) cloud has gained significant attention in the
    Semantic Web community over the past few years. With rapid expansion in
    size and diversity, it consists of over 800 interlinked datasets with over 60 billion
    triples. These datasets encapsulate structured data and knowledge spanning over
    varied domains such as entertainment, life sciences, publications, geography,
    and government. Applications can take advantage of this by using the knowledge
    distributed…

    The Linked Open Data (LOD) cloud has gained significant attention in the
    Semantic Web community over the past few years. With rapid expansion in
    size and diversity, it consists of over 800 interlinked datasets with over 60 billion
    triples. These datasets encapsulate structured data and knowledge spanning over
    varied domains such as entertainment, life sciences, publications, geography,
    and government. Applications can take advantage of this by using the knowledge
    distributed over the interconnected datasets, which is not realistic to find in a single
    place elsewhere. However, two of the key obstacles in using the LOD cloud are the
    limited support for data integration tasks over concepts, instances, and properties,
    and relevant data source selection for querying over multiple datasets. We review,
    in brief, some of the important and interesting technical approaches found in
    the literature that address these two issues. We observe that the general purpose
    alignment techniques developed outside the LOD context fall short in meeting the
    heterogeneous data representation of LOD. Therefore, an LOD-specific review of
    these techniques (especially for alignment) is important to the community. The
    topics covered and discussed in this article fall under two broad categories, namely
    alignment techniques for LOD datasets and relevant data source selection in the
    context of query processing over LOD datasets.

    Other authors
    See publication
  • YouRank: Let User Engagement Rank Microblog Search Results

    The Eighth International AAAI Conference on Weblogs and Social Media (ICWSM 2014)

    We propose an approach for ranking microblog search results. The basic idea is to leverage user engagement for the purpose of ranking: if a microblog post received many retweets/replies, this means users find it important and it should be ranked higher. However, simply applying the raw count of engagement may bias the ranking by favoring posts from celebrity users whose posts generally receive a disproportionate amount of engagement regardless of the contents of posts. To reduce this bias, we…

    We propose an approach for ranking microblog search results. The basic idea is to leverage user engagement for the purpose of ranking: if a microblog post received many retweets/replies, this means users find it important and it should be ranked higher. However, simply applying the raw count of engagement may bias the ranking by favoring posts from celebrity users whose posts generally receive a disproportionate amount of engagement regardless of the contents of posts. To reduce this bias, we propose a variety of time window-based outlier features that transfer the raw engagement count into an importance score, on a per user basis. The evaluation on five real-world datasets confirms that the proposed approach can be used to improve microblog search.

    Other authors
    See publication
  • Comparative Trust Management with Applications: Bayesian Approaches Emphasis

    Journal of Future Generation Computer Systems (FGCS)

    Trust relationships occur naturally in many diverse contexts such as collaborative systems, e-commerce, interpersonal interactions, social networks, and semantic sensor web. As agents providing content and services become increasingly removed from the agents that consume them, the issue of robust trust inference and update becomes critical. There is a need to find online substitutes for additional (direct or face-to-face) cues to derive measures of trust, and create efficient and robust systems…

    Trust relationships occur naturally in many diverse contexts such as collaborative systems, e-commerce, interpersonal interactions, social networks, and semantic sensor web. As agents providing content and services become increasingly removed from the agents that consume them, the issue of robust trust inference and update becomes critical. There is a need to find online substitutes for additional (direct or face-to-face) cues to derive measures of trust, and create efficient and robust systems for managing trust in order to support decision making. Unfortunately, there is neither a universal notion of trust that is applicable to all domains nor a clear explication of its semantics or computation in many situations. We motivate the trust problem, explain the relevant concepts, summarize research in modeling trust and gleaning trustworthiness, and discuss challenges
    confronting us. The goal is to provide a comprehensive broad overview of the trust landscape, with the nittygritties of a handful of approaches. We also provide details of the theoretical underpinnings and comparative analysis of Bayesian approaches to binary and multilevel trust, to automatically determine trustworthiness in a variety of reputation systems including those used in sensor networks, e-commerce, and collaborative environments. Ultimately, we need to develop expressive trust networks that can be assigned objective semantics.???

    Other authors
    See publication
  • User Interests Identification on Twitter Using a Hierarchical Knowledge Base

    Extended Semantic Web Conference 2014 (To Appear)

    Industry and researchers have identified numerous ways to monetize microblogs for personalization and recommendation. A common challenge across these different works is identification of user interests. Although techniques have been developed to address this challenge, a flexible approach that spans multiple levels of granularity in user interests has not been forthcoming.

    In this work, we focus on exploiting hierarchical semantics of concepts to infer richer user interests expressed as…

    Industry and researchers have identified numerous ways to monetize microblogs for personalization and recommendation. A common challenge across these different works is identification of user interests. Although techniques have been developed to address this challenge, a flexible approach that spans multiple levels of granularity in user interests has not been forthcoming.

    In this work, we focus on exploiting hierarchical semantics of concepts to infer richer user interests expressed as Hierarchical Interest Graph. To create such graphs, we utilize user's Twitter data to first ground potential user interests to structured background knowledge such as Wikipedia Category Graph. We then use an adaptation of spreading activation theory to assign user interest score (or weights) to each category in the hierarchy. The Hierarchical Interest Graph not only comprises of user's explicitly mentioned interests determined from Twitter, but also their implicit interest categories inferred from the background knowledge source. We demonstrate the effectiveness of our approach through a user study which shows an average of approximately eight of the top ten weighted categories in the graph being relevant to a given user's interests.

    Other authors
    See publication
  • User Interests Identification on Twitter Using a Hierarchical Knowledge Base

    Extended Semantic Web Conference 2014 (To Appear)

    Twitter, due to its massive growth as a social networking platform has been in focus to analyze its user generated content for personalization and recommendation tasks. A common challenge across these tasks is identifying user interests from tweets. Lately, semantic enrichment of Twitter posts to determine (entity-based) user interests has been an active area of research. The advantages of these approaches include interoperability, information reuse and the availability of knowledge-bases to be…

    Twitter, due to its massive growth as a social networking platform has been in focus to analyze its user generated content for personalization and recommendation tasks. A common challenge across these tasks is identifying user interests from tweets. Lately, semantic enrichment of Twitter posts to determine (entity-based) user interests has been an active area of research. The advantages of these approaches include interoperability, information reuse and the availability of knowledge-bases to be exploited. However, exploiting these knowledge bases for identifying user interests still remains a challenge. In this work, we focus on exploiting hierarchical relationships present in knowledge-bases to infer richer user interests expressed as Hierarchical Interest Graph. We argue that the hierarchical semantics of concepts can enhance the existing systems to personalize or recommend items based on varied level of conceptual abstractness. We demonstrate the effectiveness of our approach through a user study which shows an average of approximately eight of the top ten weighted hierarchical interests in the graph being relevant to a given user's interests.

    Other authors
  • An Information Filtering and Management Model for Twitter Traffic to Assist Crisis Response Coordination

    Journal of CSCW, Springer

    Model for filtering information by using psycholinguistics theories to identify tacit cooperation in the declarations of resource needs and availability during disaster response on social media. Also, a domain ontology to create an annotated information repository for supporting the varying abstract presentation of organized, actionable information nuggets regarding resource needs and availability in visual interfaces, as well as complex querying ability for who-what-where in coordination.

    Other authors
    See publication
  • An Information Filtering and Management Model for Twitter Traffic to Assist Crisis Response Coordination

    Journal of CSCW, Springer

    Model for filtering information by using psycholinguistics theories to identify tacit cooperation in the declarations of resource needs and availability during disaster response on social media. Also, a domain ontology to create an annotated information repository for supporting the varying abstract presentation of organized, actionable information nuggets regarding resource needs and availability in visual interfaces, as well as complex querying ability for who-what-where in coordination.

    Other authors
    See publication
  • Analysis of Online Information Searching for Cardiovascular Diseases on a Consumer Health Information Portal

    AMIA Annual Symposium

    Since the early 2000’s, Internet usage for health information searching has increased significantly. Studying search queries can help us to understand users “information need” and how do they formulate search queries (“expression of information need”). Although cardiovascular diseases (CVD) affect a large percentage of the population, few studies have investigated how and what users search for CVD. We address this knowledge gap in the community by analyzing a large corpus of 10 million CVD…

    Since the early 2000’s, Internet usage for health information searching has increased significantly. Studying search queries can help us to understand users “information need” and how do they formulate search queries (“expression of information need”). Although cardiovascular diseases (CVD) affect a large percentage of the population, few studies have investigated how and what users search for CVD. We address this knowledge gap in the community by analyzing a large corpus of 10 million CVD related search queries from MayoClinic.com. Using UMLS MetaMap and UMLS semantic types/concepts, we developed a rule-based approach to categorize the queries into 14 health categories. We analyzed structural properties, types (keyword-based/Wh-questions/Yes-No questions) and linguistic structure of the queries. Our results show that the most searched health categories are ‘Diseases/Conditions’, ‘Vital-Sings’, ‘Symptoms’ and ‘Living-with’. CVD queries are longer and are predominantly keyword-based. This study extends our knowledge about online health information searching and provides useful insights for Web search engines and health websites.

    Other authors
  • Don't like RDF Reification? Making Statements about Statements using Singleton Property

    Proceedings of International Conference on World Wide Web (WWW2014)

    Statements about RDF statements, or meta triples, provide additional information about individual triples, such as the source, the occurring time or place, or the certainty. Integrating such meta triples into semantic knowledge bases would enable the querying and reasoning mechanisms to be aware of provenance, time, location, or certainty of triples. However, an efficient RDF representation for such meta knowledge of triples remains challenging. The existing standard reification approach allows…

    Statements about RDF statements, or meta triples, provide additional information about individual triples, such as the source, the occurring time or place, or the certainty. Integrating such meta triples into semantic knowledge bases would enable the querying and reasoning mechanisms to be aware of provenance, time, location, or certainty of triples. However, an efficient RDF representation for such meta knowledge of triples remains challenging. The existing standard reification approach allows such meta knowledge of RDF triples to be expressed using RDF by two steps. The first step is representing the triple by a Statement instance which has subject, predicate, and object indicated separately in three different triples. The second step is creating assertions about that instance as if it is a statement. While reification is simple and intuitive, this approach does not have formal semantics and is not commonly used in practice as described in the RDF Primer. In this paper, we propose a novel approach called Singleton Property for representing statements about statements and provide a formal semantics for it. We explain how this singleton property approach fits well with the existing syntax and formal semantics of RDF, and the syntax of SPARQL query language. We also demonstrate the use of singleton property in the representation and querying of meta knowledge in two examples of Semantic Web knowledge bases: YAGO2 and BKR. Our experiments on the BKR show that the singleton property approach give a decent performance in terms of number of triples, query length and query execution time compared to existing approaches. This approach, which is also simple and intuitive, can be easily adopted for representing and querying statements about statements in other knowledge bases.

    Other authors
    See publication
  • Emergency-relief coordination on social media

    First Monday


    Disaster affected communities are increasingly turning to social media for communication and coordination. This includes reports on needs (demands) and offers (supplies) of resources required during emergency situations. Identifying and matching such requests with potential responders can substantially accelerate emergency relief efforts. Current work of disaster management agencies is labor intensive, and there is substantial interest in automated tools.

    We present machine–learning…


    Disaster affected communities are increasingly turning to social media for communication and coordination. This includes reports on needs (demands) and offers (supplies) of resources required during emergency situations. Identifying and matching such requests with potential responders can substantially accelerate emergency relief efforts. Current work of disaster management agencies is labor intensive, and there is substantial interest in automated tools.

    We present machine–learning methods to automatically identify and match needs and offers communicated via social media for items and services such as shelter, money, clothing, etc. For instance, a message such as “we are coordinating a clothing/food drive for families affected by Hurricane Sandy. If you would like to donate, DM us” can be matched with a message such as “I got a bunch of clothes I’d like to donate to hurricane sandy victims. Anyone know where/how I can do that?” Compared to traditional search, our results can significantly improve the matchmaking efforts of disaster response agencies.

    Other authors
    See publication
  • Emergency-relief coordination on social media

    First Monday

    Disaster affected communities are increasingly turning to social media for communication and coordination. This includes reports on needs (demands) and offers (supplies) of resources required during emergency situations. Identifying and matching such requests with potential responders can substantially accelerate emergency relief efforts. Current work of disaster management agencies is labor intensive, and there is substantial interest in automated tools.

    We present machine–learning…

    Disaster affected communities are increasingly turning to social media for communication and coordination. This includes reports on needs (demands) and offers (supplies) of resources required during emergency situations. Identifying and matching such requests with potential responders can substantially accelerate emergency relief efforts. Current work of disaster management agencies is labor intensive, and there is substantial interest in automated tools.

    We present machine–learning methods to automatically identify and match needs and offers communicated via social media for items and services such as shelter, money, clothing, etc. For instance, a message such as “we are coordinating a clothing/food drive for families affected by Hurricane Sandy. If you would like to donate, DM us” can be matched with a message such as “I got a bunch of clothes I’d like to donate to hurricane sandy victims. Anyone know where/how I can do that?” Compared to traditional search, our results can significantly improve the matchmaking efforts of disaster response agencies.

    Other authors
    See publication
  • Location Prediction of Twitter Users using Wikipedia

    Kno.e.sis Technical Report

    Location of Twitter users has been a prominent metadata for various applications such as personalization, recommendation and disaster management on social networks. However, only 4% of Twitter users share their location information. This paper presents a knowledge-based approach to predict locations of Twitter users. The paper is a technical report summarizing my thesis and is in the process of submission as a conference publication.

    Other authors
    See publication
  • On Understanding Divergence of Online Social Group Discussion

    AAAI, ICWSM-14

    We study online social group dynamics based on how group members diverge in their online discussions. Previous studies mostly focused on the link structures to characterize social group dynamics, whereas the group behavior of content generation in discussions is not well understood. Particularly, we use Jensen-Shannon (JS) divergence to measure the divergence of topics in user-generated contents, and how it progresses over time. We study Twitter messages (tweets) in multiple real-world events…

    We study online social group dynamics based on how group members diverge in their online discussions. Previous studies mostly focused on the link structures to characterize social group dynamics, whereas the group behavior of content generation in discussions is not well understood. Particularly, we use Jensen-Shannon (JS) divergence to measure the divergence of topics in user-generated contents, and how it progresses over time. We study Twitter messages (tweets) in multiple real-world events (natural disasters and social activism) with different times and demographics. We also model structural and user features with guidance from two socio-psychological theories, social cohesion and social identity, to learn their implications on group discussion divergence. Those features show significant correlation with group discussion divergence. By leveraging them we are able to construct a classifier to predict the future increase or decrease in group discussion divergence, which achieves an area under the curve (AUC) of 0.84 and an F-1 score (harmonic mean of precision and recall) of 0.8. Our approach allows to systematically study collective diverging group behavior independent of group formation design. It can help to prioritize whom to engage with in communities for specific topics of needs during disaster response coordination, and for specific concerns and advocacy in the brand management.

    Citation: Hemant Purohit, Yiye Ruan, Dave Fuhry, Srinivasan Parthasarathy, Amit Sheth. On Understanding Divergence of Online Social Group Discussion. In 8th Int'l AAAI Conference on Weblogs and Social Media (ICWSM 2014), June 2014

    Other authors
    See publication
  • Twitris- a System for Collective Social Intelligence

    Encyclopedia of Social Network Analysis and Mining (ESNAM), Springer

    Twitris, a Semantic Web application that facilitates understanding of social perceptions by Semantics-based processing of massive amounts of event-centric data. Twitris addresses challenges in large scale processing of social data, preserving spatio-temporal-thematic properties and focusing on multi-dimensional analysis of sptatio-temporal-thematic, people-content-network and sentiment-emotion-subjectivity facets. Twitris also covers context based semantic integration of multiple Web resources…

    Twitris, a Semantic Web application that facilitates understanding of social perceptions by Semantics-based processing of massive amounts of event-centric data. Twitris addresses challenges in large scale processing of social data, preserving spatio-temporal-thematic properties and focusing on multi-dimensional analysis of sptatio-temporal-thematic, people-content-network and sentiment-emotion-subjectivity facets. Twitris also covers context based semantic integration of multiple Web resources and expose semantically enriched social data to the public domain. Semantic Web technologies enable the system's integration and analysis abilities. It has applications for studying and analyzing social sensing and perception of a broad variety of events: politics and elections, social movements and uprisings, crisis and disasters, entertainment, environment, decision making and coordination, brand management, campaign effectiveness, etc.

    Other authors
    See publication
  • What Information about Cardiovascular Diseases do People Search Online?

    25th European Medical Informatics Conference (MIE 2014), Istanbul, Turkey

    In this work, we performed categorization of cardiovascular disease (CVD) related search queries into “consumer-oriented” health categories to study what health topics users search for CVD. This study provides useful insights for online health information seeking and information needs in chronic diseases and particularly in CVD.

    Other authors
  • With Whom to Coordinate, Why and How in Ad-hoc Social Media Communities during Crisis Response

    ISCRAM-14

    During crises affected people, well-wishers, and observers join social media communities to discuss the event while sharing useful information relevant to response coordination, for example, specific resource needs. But it is difficult to identify and engage with such users, our framework enables such coordination assistive engagement.

    Other authors
    See publication
  • Advancing data reuse in phyloinformatics using an ontology-driven Semantic Web approach

    BMC Medical Genomics

    Phylogenetic analyses can resolve historical relationships among genes, organisms or higher taxa. Understanding such relationships can elucidate a wide range of biological phenomena, including, for example, the importance of gene and genome duplications in the evolution of gene function, the role of adaptation as a driver of diversification, or the evolutionary consequences of biogeographic shifts. Phyloinformaticists are developing data standards, databases and communication protocols (e.g…

    Phylogenetic analyses can resolve historical relationships among genes, organisms or higher taxa. Understanding such relationships can elucidate a wide range of biological phenomena, including, for example, the importance of gene and genome duplications in the evolution of gene function, the role of adaptation as a driver of diversification, or the evolutionary consequences of biogeographic shifts. Phyloinformaticists are developing data standards, databases and communication protocols (e.g. Application Programming Interfaces, APIs) to extend the accessibility of gene trees, species trees, and the metadata necessary to interpret these trees, thus enabling researchers across the life sciences to reuse phylogenetic knowledge. Specifically, Semantic Web technologies are being developed to make phylogenetic knowledge interpretable by web agents, thereby enabling intelligently automated, high-throughput reuse of results generated by phylogenetic research. This manuscript describes an ontology-driven, semantic problem-solving environment for phylogenetic analyses and introduces artefacts that can promote phyloinformatic efforts to promote accessibility of trees and underlying metadata. PhylOnt is an extensible ontology with concepts describing tree types and tree building methodologies including estimation methods, models and programs. In addition we present the PhylAnt platform for annotating scientific articles and NeXML files with PhylOnt concepts. The novelty of this work is the annotation of NeXML files and phylogenetic related documents with PhylOnt Ontology. This approach advances data reuse in phyloinformatics.

    Other authors
    See publication
  • Automatic Domain Identification for Linked Open Data

    IEEE/WIC/ACM International Conference on Web Intelligence

    Linked Open Data (LOD) has emerged as one of the largest collections of interlinked structured datasets on the Web. Although the adoption of such datasets for applications is increasing, identifying relevant datasets for a specific task or topic is still challenging. As an initial step to make such identification easier, we provide an approach to automatically identify the topic domains of given datasets. Our method utilizes existing knowledge sources, more specifically Freebase, and we present…

    Linked Open Data (LOD) has emerged as one of the largest collections of interlinked structured datasets on the Web. Although the adoption of such datasets for applications is increasing, identifying relevant datasets for a specific task or topic is still challenging. As an initial step to make such identification easier, we provide an approach to automatically identify the topic domains of given datasets. Our method utilizes existing knowledge sources, more specifically Freebase, and we present an evaluation which validates the topic domains we can identify with our system. Furthermore, we evaluate the effectiveness of identified topic domains for the purpose of finding relevant datasets, thus showing that our approach improves reusability of LOD datasets.

    Other authors
    See publication
  • Characterizing concepts of interest leveraging Linked Data and the Social Web

    Web Intelligence 2013

    Extracting and representing user interests on the Social Web is becoming an essential part of the Web for personalisation and recommendations. Such personalisation is required in order to provide an adaptive Web to users, where content fits their preferences, background and current interests, making the Web more social and relevant. Current techniques analyse user activities on social media systems and collect structured or unstructured sets of entities representing users' interests. These sets…

    Extracting and representing user interests on the Social Web is becoming an essential part of the Web for personalisation and recommendations. Such personalisation is required in order to provide an adaptive Web to users, where content fits their preferences, background and current interests, making the Web more social and relevant. Current techniques analyse user activities on social media systems and collect structured or unstructured sets of entities representing users' interests. These sets of entities, or user profiles of interest, are often missing the semantics of the entities in terms of: (i) popularity and temporal dynamics of the interests on the Social Web and (ii) abstractness of the entities in the real world. State of the art techniques to compute these values are using specific knowledge bases or taxonomies and need to analyse the dynamics of the entities over a period of time. Hence, we propose a real-time, computationally inexpensive, domain independent model for concepts of interest composed of: popularity, temporal dynamics and specificity. We describe and evaluate a novel algorithm for computing specificity leveraging the semantics of Linked Data and evaluate the impact of our model on user profiles of interests.

    Citation:
    Fabrizio Orlandi, Pavan Kapanipathi, Amit Sheth, Alexandre Passant, "Characterising Concepts of Interest Leveraging Linked Data and the Social Web," IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2013 pp. 519-526.

    Other authors
    See publication
  • Cursing in English on Twitter

    ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW'14)

    Examine the characteristics of cursing activity on Twitter, involving the analysis of about 51 million tweets and about 14 million users. In particular, we explore a set of questions that have been recognized as crucial for understanding cursing in offline communications by prior studies, including the ubiquity, utility, and contextual dependencies of cursing.

    Other authors
    See publication
  • Cursing in English on Twitter

    ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW'14)

    Examine the characteristics of cursing activity on Twitter, involving the analysis of about 51 million tweets and about 14 million users. In particular, we explore a set of questions that have been recognized as crucial for understanding cursing in offline communications by prior studies, including the ubiquity, utility, and contextual dependencies of cursing.

    Other authors
    See publication
  • A statistical and schema independent approach to identify equivalent properties on linked data

    9th International Conference on Semantic Systems - ISEMANTICS 2013

    Linked Open Data (LOD) cloud has gained significant attention in the Semantic Web community recently. Currently it consists of approximately 295 interlinked datasets with over 50 billion triples including 500 million links, and continues to expand in size. This vast source of structured information has the potential to have a significant impact on knowledge-based applications. However, a key impediment to the use of LOD cloud is limited support for data integration tasks over concepts…

    Linked Open Data (LOD) cloud has gained significant attention in the Semantic Web community recently. Currently it consists of approximately 295 interlinked datasets with over 50 billion triples including 500 million links, and continues to expand in size. This vast source of structured information has the potential to have a significant impact on knowledge-based applications. However, a key impediment to the use of LOD cloud is limited support for data integration tasks over concepts, instances, and properties. Efforts to address this limitation over properties have focused on matching data-type properties across datasets; however, matching of object-type properties has not received similar attention. We present an approach that can automatically match object-type properties across linked datasets, primarily exploiting and bootstrapping from entity co-reference links such as owl:sameAs. Our evaluation, using sample instance sets taken from Freebase, DBpedia, LinkedMDB, and DBLP datasets covering multiple domains shows that our approach matches properties with high precision and recall (on average, F measure gain of 57% - 78%).

    Other authors
    See publication
  • A statistical and schema independent approach to identify equivalent properties on linked data

    9th International Conference on Semantic Systems - iSEMANTICS 2013

    Linked Open Data (LOD) cloud has gained significant attention in the Semantic Web community recently. Currently it consists of approximately 295 interlinked datasets with over 50 billion triples including 500 million links, and continues to expand in size. This vast source of structured information has the potential to have a significant impact on knowledge-based applications. However, a key impediment to the use of LOD cloud is limited support for data integration tasks over concepts…

    Linked Open Data (LOD) cloud has gained significant attention in the Semantic Web community recently. Currently it consists of approximately 295 interlinked datasets with over 50 billion triples including 500 million links, and continues to expand in size. This vast source of structured information has the potential to have a significant impact on knowledge-based applications. However, a key impediment to the use of LOD cloud is limited support for data integration tasks over concepts, instances, and properties. Efforts to address this limitation over properties have focused on matching data-type properties across datasets; however, matching of object-type properties has not received similar attention. We present an approach that can automatically match object-type properties across linked datasets, primarily exploiting and bootstrapping from entity co-reference links such as owl:sameAs. Our evaluation, using sample instance sets taken from Freebase, DBpedia, LinkedMDB, and DBLP datasets covering multiple domains shows that our approach matches properties with high precision and recall (on average, F measure gain of 57% - 78%).

    Other authors
    See publication
  • Challenges in Understanding Clinical Notes: Why NLP Engines Fall Short and Where Background Knowledge Can Help

    International Workshop on Data management & Analytics for healthcaRE co-located with ACM CIKM 2013

    Understanding of Electronic Medical Records(EMRs) plays a crucial role in improving healthcare outcomes. However, the unstructured nature of EMRs poses several technical challenges for structured information extraction from clinical notes leading to automatic analysis. Natural Language Processing(NLP) techniques developed to process EMRs are effective for variety of tasks, they often fail to preserve the semantics of original information expressed in EMRs, particularly in complex scenarios…

    Understanding of Electronic Medical Records(EMRs) plays a crucial role in improving healthcare outcomes. However, the unstructured nature of EMRs poses several technical challenges for structured information extraction from clinical notes leading to automatic analysis. Natural Language Processing(NLP) techniques developed to process EMRs are effective for variety of tasks, they often fail to preserve the semantics of original information expressed in EMRs, particularly in complex scenarios. This paper illustrates the complexity of the problems involved and deals with conflicts created due to the shortcomings of NLP techniques and demonstrates where domain specific knowledge bases can come to rescue in resolving conflicts that can significantly improve the semantic annotation and structured information extraction. We discuss various insights gained from our study on real world dataset.

    Other authors
    See publication
  • Types of Property Pairs and Alignment on Linked Datasets – A Preliminary Analysis

    Proceedings of the I-SEMANTICS 2013 Posters & Demonstrations Track co-located with 9th International Conference on Semantic Systems (I-SEMANTICS 2013) Graz, Austria, September 4-6, 2013.

    Dataset publication on the Web has been greatly influenced by the
    Linked Open Data (LOD) project. Many interlinked datasets have become freely
    available on the Web creating a structured and distributed knowledge representation. Analysis and aligning of concepts and instances in these interconnected
    datasets have received a lot of attention in the recent past compared to properties.
    We identify three different categories of property pairs found in the alignment
    process and study…

    Dataset publication on the Web has been greatly influenced by the
    Linked Open Data (LOD) project. Many interlinked datasets have become freely
    available on the Web creating a structured and distributed knowledge representation. Analysis and aligning of concepts and instances in these interconnected
    datasets have received a lot of attention in the recent past compared to properties.
    We identify three different categories of property pairs found in the alignment
    process and study their relative distribution among well known LOD datasets.
    We also provide comparative analysis of state-of-the-art techniques with regard
    to different categories, highlighting their capabilities. This could lead to more
    realistic and useful alignment of properties in LOD and similar datasets.

    Other authors
    See publication
  • Types of Property Pairs and Alignment on Linked Datasets – A Preliminary Analysis

    Proceedings of the I-SEMANTICS 2013

    Dataset publication on the Web has been greatly influenced by the
    Linked Open Data (LOD) project. Many interlinked datasets have become freely
    available on the Web creating a structured and distributed knowledge representation. Analysis and aligning of concepts and instances in these interconnected
    datasets have received a lot of attention in the recent past compared to properties.
    We identify three different categories of property pairs found in the alignment
    process and study…

    Dataset publication on the Web has been greatly influenced by the
    Linked Open Data (LOD) project. Many interlinked datasets have become freely
    available on the Web creating a structured and distributed knowledge representation. Analysis and aligning of concepts and instances in these interconnected
    datasets have received a lot of attention in the recent past compared to properties.
    We identify three different categories of property pairs found in the alignment
    process and study their relative distribution among well known LOD datasets.
    We also provide comparative analysis of state-of-the-art techniques with regard
    to different categories, highlighting their capabilities. This could lead to more
    realistic and useful alignment of properties in LOD and similar datasets.

    Other authors
    See publication
  • Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citizen Roles for Crisis Response,

    AAAI, ICWSM-13 Tutorials

    This tutorial weaves three themes and corresponding relevant topics- a.) citizen sensing and crisis mapping, b.) technical challenges and recent research for leveraging citizen sensing to improve crisis response coordination, and c.) experiences in building robust and scalable platforms/systems. It couples technical insights with identification of computational techniques and algorithms along with real-world examples.

    Other authors
    See publication
  • Twitris v3: From Citizen Sensing to Analysis, Coordination and Action

    AAAI, ICWSM-13

    System to leverage social media analytics beyond computing the obvious, instead focus on targeted action oriented computing to assist macro level phenomenon of coordination and decision making

    [System Demonstration Paper]

    Other authors
    See publication
  • What Kind of #Communication is Twitter? Mining #Psycholinguistic Cues for Emergency Coordination

    Computers in Human Behavior Journal, Elsevier

    The information overload created by social media messages in emergency situations challenges response organizations to find targeted content and users. We aim to select useful messages by detecting the presence of conversation as an indicator of coordinated citizen action. Using simple linguistic indicators associated with conversation analysis in social science, we model the presence of conversation in the communication landscape of Twitter in a large corpus of 1.5M tweets for various disaster…

    The information overload created by social media messages in emergency situations challenges response organizations to find targeted content and users. We aim to select useful messages by detecting the presence of conversation as an indicator of coordinated citizen action. Using simple linguistic indicators associated with conversation analysis in social science, we model the presence of conversation in the communication landscape of Twitter in a large corpus of 1.5M tweets for various disaster and non-disaster events spanning different periods, lengths of time and varied social significance. Within Replies, Retweets and tweets that mention other Twitter users, we found that domain-independent, linguistic cues distinguish likely conversation from non-conversation in this online (mediated) communication. We demonstrate that conversation subsets within Replies, Retweets and tweets that mention other Twitter users potentially contain more information than non-conversation subsets. Information density also increases for tweets that are not Replies, Retweets or mentioning other Twitter users, as long as they reflect conversational properties. From a practical perspective, we have developed a model for trimming the candidate tweet corpus to identify a much smaller subset of data for submission to deeper, domain-dependent semantic analyses for the identification of actionable information nuggets for coordinated emergency response.

    Other authors
    See publication
  • What Kind of #Communication is Twitter? Mining #Psycholinguistic Cues for Emergency Coordination

    Computers in Human Behavior Journal, Elsevier

    Information filtering model to reduce Twitter traffic for disaster coordination, where modeling of coordination is done via pyscholinguistic theories of conversations. Also, a proof for existence of similar human behavior of face-to-face communication in online (mediated) communication

    Other authors
    See publication
  • What Kind of #Communication is Twitter? Mining #Psycholinguistic Cues for Emergency Coordination

    Computers in Human Behavior Journal, Elsevier

    Information filtering model to reduce Twitter traffic for disaster coordination, where modeling of coordination is done via pyscholinguistic theories of conversations. Also, a proof for existence of similar human behavior of face-to-face communication in online (mediated) communication

    Other authors
    See publication
  • Comparative Trust Management with Applications: Bayesian Approaches Emphasis

    Journal of Future Generation Computer Systems (FGCS)

    Trust relationships occur naturally in many diverse contexts such as collaborative systems, e-commerce, interpersonal interactions, social networks, and semantic sensor web. As agents providing content and services become increasingly removed from the agents that consume them, the issue of robust trust inference and update becomes critical. There is a need to find online substitutes for additional (direct or face-to-face) cues to derive measures of trust, and create efficient and robust systems…

    Trust relationships occur naturally in many diverse contexts such as collaborative systems, e-commerce, interpersonal interactions, social networks, and semantic sensor web. As agents providing content and services become increasingly removed from the agents that consume them, the issue of robust trust inference and update becomes critical. There is a need to find online substitutes for additional (direct or face-to-face) cues to derive measures of trust, and create efficient and robust systems for managing trust in order to support decision making. Unfortunately, there is neither a universal notion of trust that is applicable to all domains nor a clear explication of its semantics or computation in many situations. We motivate the trust problem, explain the relevant concepts, summarize research in modeling trust and gleaning trustworthiness, and discuss challenges
    confronting us. The goal is to provide a comprehensive broad overview of the trust landscape, with the nittygritties of a handful of approaches. We also provide details of the theoretical underpinnings and comparative analysis of Bayesian approaches to binary and multilevel trust, to automatically determine trustworthiness in a variety of reputation systems including those used in sensor networks, e-commerce, and collaborative environments.

    Keywords: Trust vs. reputation; Trust ontology; Gleaning trustworthiness; Trust metrics and models (propagation: chaining and aggregation); Social and sensor networks; Collaborative systems; Trust system attacks; Beta-PDF; Dirichlet distribution; Binary and multi-level trust

    Citation: Krishnaprasad Thirunarayan, Pramod Anantharam, Cory Henson, Amit Sheth
    Comparative trust management with applications: Bayesian approaches emphasis
    Future Generation Computer Systems, Volume 31, February 2014, Pages 182–199
    http://dx.doi.org/10.1016/j.future.2013.05.006

    Access: http://www.knoesis.org/library/resource.php?id=1875

    Other authors
    See publication
  • Physical-Cyber-Social Computing: An Early 21st Century Approach

    IEEE Intelligent Systems

    Visionaries and scientists from the early days of computing and electronic communication have discussed the proper role of technology to improve human experience. Technology now plays an increasingly important role in facilitating and improving personal and social activities and engagements, decision making, interaction with physical and social worlds, generating insights, and just about anything that a human, as an intelligent being, seeks to do. This article presents a vision of…

    Visionaries and scientists from the early days of computing and electronic communication have discussed the proper role of technology to improve human experience. Technology now plays an increasingly important role in facilitating and improving personal and social activities and engagements, decision making, interaction with physical and social worlds, generating insights, and just about anything that a human, as an intelligent being, seeks to do. This article presents a vision of Physical-Cyber-Social (PCS) computing for a holistic treatment of data, information, and knowledge from physical, cyber, and social worlds to integrate, understand, correlate, and provide contextually relevant abstractions to humans.

    Cite: Amit Sheth, Pramod Anantharam, Cory Henson, "Physical-Cyber-Social Computing: An Early 21st Century Approach," IEEE Intelligent Systems, vol. 28, no. 1, pp. 78-82, Jan.-Feb., 2013

    Access: http://knoesis.wright.edu/library/resource.php?id=1816

    More: http://wiki.knoesis.org/index.php/PCS

    Other authors
    See publication
  • Physical-Cyber-Social Computing: An Early 21st Century Approach

    IEEE Intelligent Systems

    Visionaries and scientists from the early days of computing and electronic communication have discussed the proper role of technology to improve human experience. Technology now plays an increasingly important role in facilitating and improving personal and social activities and engagements, decision making, interaction with physical and social worlds, generating insights, and just about anything that a human, as an intelligent being, seeks to do. This article presents a vision of…

    Visionaries and scientists from the early days of computing and electronic communication have discussed the proper role of technology to improve human experience. Technology now plays an increasingly important role in facilitating and improving personal and social activities and engagements, decision making, interaction with physical and social worlds, generating insights, and just about anything that a human, as an intelligent being, seeks to do. This article presents a vision of Physical-Cyber-Social (PCS) computing for a holistic treatment of data, information, and knowledge from physical, cyber, and social worlds to integrate, understand, correlate, and provide contextually relevant abstractions to humans.

    Other authors
    See publication
  • Semantics Driven Approach for Knowledge Acquisition from EMRs

    Journal of Biomedical and Health Informatics

    Semantic computing technologies have matured to be applicable to many critical domains such as national security, life sciences, and health care. However, the key to their success is the availability of a rich domain knowledge base. The creation and refinement of domain knowledge bases poses difficult challenges. The existing knowledge bases in the health care domain are rich in taxonomic relationships, but they lack non-taxonomic (domain) relationships. In this paper, we describe a…

    Semantic computing technologies have matured to be applicable to many critical domains such as national security, life sciences, and health care. However, the key to their success is the availability of a rich domain knowledge base. The creation and refinement of domain knowledge bases poses difficult challenges. The existing knowledge bases in the health care domain are rich in taxonomic relationships, but they lack non-taxonomic (domain) relationships. In this paper, we describe a semi-automatic technique for enriching existing domain knowledge bases with causal relationships gleaned from Electronic Medical Records (EMR) data. We determine missing causal relationships between domain concepts by validating domain knowledge against EMR data sources and leveraging semantic-based techniques to derive plausible relationships that can rectify knowledge gaps. Our evaluation demonstrates that semantic techniques can be employed to improve the efficiency of knowledge acquisition.

    Other authors
    See publication
  • Semantics Driven Approach for Knowledge Acquisition from EMRs

    Journal of Biomedical and Health Informatics

    Semantic computing technologies have matured to be applicable to many critical domains such as national security, life sciences, and health care. However, the key to their success is the availability of a rich domain knowledge base. The creation and refinement of domain knowledge bases poses difficult challenges. The existing knowledge bases in the health care domain are rich in taxonomic relationships, but they lack non-taxonomic (domain) relationships. In this paper, we describe a…

    Semantic computing technologies have matured to be applicable to many critical domains such as national security, life sciences, and health care. However, the key to their success is the availability of a rich domain knowledge base. The creation and refinement of domain knowledge bases poses difficult challenges. The existing knowledge bases in the health care domain are rich in taxonomic relationships, but they lack non-taxonomic (domain) relationships. In this paper, we describe a semi-automatic technique for enriching existing domain knowledge bases with causal relationships gleaned from Electronic Medical Records (EMR) data. We determine missing causal relationships between domain concepts by validating domain knowledge against EMR data sources and leveraging semantic-based techniques to derive plausible relationships that can rectify knowledge gaps. Our evaluation demonstrates that semantic techniques can be employed to improve the efficiency of knowledge acquisition.

    Other authors
    See publication
  • Are Twitter Users Equal in Predicting Elections? A Study of User Groups in Predicting 2012 U.S. Republican Presidential Primaries

    In Proceedings of the Fourth International Conference on Social Informatics (SocInfo'12)

    Existing studies on predicting election results are under the assumption that all the users should be treated equally. However, recent work[14] shows that social media users from different groups (e.g., 'silent majority' vs. 'vocal minority') have significant differences in the generated content and tweeting behavior. The effect of these differences on predicting election results has not been exploited yet. In this paper, we study the spectrum of Twitter users who participate in the on-line…

    Existing studies on predicting election results are under the assumption that all the users should be treated equally. However, recent work[14] shows that social media users from different groups (e.g., 'silent majority' vs. 'vocal minority') have significant differences in the generated content and tweeting behavior. The effect of these differences on predicting election results has not been exploited yet. In this paper, we study the spectrum of Twitter users who participate in the on-line discussion of 2012 U.S. Republican Presidential Primaries, and examine the predictive power of different user groups (e.g., highly engaged users vs. lowly engaged users, right-leaning users vs. left-leaning users) against Super Tuesday primaries in 10 states. Specifically, we characterize users across four dimensions, including three dimensions of user participation measured by tweet-based properties (engagement degree, tweet mode, and content type) and one dimension of users' political preference. We study different groups of users in each dimension and compare them on the task of electoral prediction. The insights gained in this study can shed light on improving the social media based prediction from the user sampling perspective and more.

    Presentation at: http://www.slideshare.net/knoesis/are-twitter-users-equal-in-predicting-elections-insights-from-republican-primaries-and-2012-general-election

    Other authors
    See publication
  • An Efficient Bit Vector Approach to Semantics-based Machine Perception in Resource-Constrained Devices

    11th International Semantic Web Conference

    The primary challenge of machine perception is to define efficient computational methods to derive high-level knowledge from low-level sensor observation data. Emerging solutions are using ontologies for expressive representation of concepts in the domain of sensing and perception, which enable advanced integration and interpretation of heterogeneous sensor data. The computational complexity of OWL, however, seriously limits its applicability and use within resource-constrained environments…

    The primary challenge of machine perception is to define efficient computational methods to derive high-level knowledge from low-level sensor observation data. Emerging solutions are using ontologies for expressive representation of concepts in the domain of sensing and perception, which enable advanced integration and interpretation of heterogeneous sensor data. The computational complexity of OWL, however, seriously limits its applicability and use within resource-constrained environments, such as mobile devices. To overcome this issue, we employ OWL to formally define the inference tasks needed for machine perception - explanation and discrimination - and then provide efficient algorithms for these tasks, using bit-vector encodings and operations. The applicability of our approach to machine perception is evaluated on a smart-phone mobile device, demonstrating dramatic improvements in both efficiency and scale.

    Other authors
    See publication
  • 'I just wanted to tell you that loperamide WILL WORK': A Web-Based Study of Extra-Medical Use of Loperamide.

    Journal of Drug and Alcohol Dependence

    Aims: Many websites provide a means for individuals to share their experiences and knowledge about different drugs. Such User-Generated Content (UGC) can be a rich data source to study emerging drug use practices and trends. This study examined UGC on extra-medical use of loperamide (e.g., Imodium® A-D) among illicit opioid users.

    Methods: A website that allows for the free discussion of illicit drugs and is accessible for public viewing was selected for analysis. Web-forum posts were…

    Aims: Many websites provide a means for individuals to share their experiences and knowledge about different drugs. Such User-Generated Content (UGC) can be a rich data source to study emerging drug use practices and trends. This study examined UGC on extra-medical use of loperamide (e.g., Imodium® A-D) among illicit opioid users.

    Methods: A website that allows for the free discussion of illicit drugs and is accessible for public viewing was selected for analysis. Web-forum posts were retrieved using Web Crawlers and retained in a local text database. The database was queried to extract posts with a mention of loperamide and relevant brand/slang terms. Over 1,290 posts were identified. A random sample of 258 posts was coded using NVivo to identify intent, dosage, and side-effects of loperamide use.

    Results: There has been an increase in discussions related to loperamide's use by non-medical opioid users, especially in 2010-2011. Loperamide was primarily discussed as a remedy to alleviate a broad range of opiate withdrawal symptoms, and was sometimes referred to as 'poor man's' methadone. Typical doses ranged 70-100 mg per day, much higher than an indicated daily dose of 16 mg.

    Conclusions: This study suggests that loperamide is being used extra-medically to self-treat opioid withdrawal symptoms. There is a growing demand among people who are opioid dependent for drugs to control withdrawal symptoms, and loperamide appears to fit that role. The study also highlights the potential of the Web as a 'leading edge' data source in identifying emerging drug use practices.

    Other authors
    See publication
  • Data Driven Knowledge Acquisition Method for Domain Knowledge Enrichment in Healthcare

    BIBM 2012, Philadelphia

    Semantic computing technologies have matured to be applicable to many critical domains, such as life sciences and health care. However, the key to their success is the rich domain knowledge which consists of domain concepts and relationships, whose creation and refinement remains a challenge. In this paper, we develop a technique for enriching domain knowledge, focusing on populating the domain relationships. We determine missing relationships between the domain concepts by validating domain…

    Semantic computing technologies have matured to be applicable to many critical domains, such as life sciences and health care. However, the key to their success is the rich domain knowledge which consists of domain concepts and relationships, whose creation and refinement remains a challenge. In this paper, we develop a technique for enriching domain knowledge, focusing on populating the domain relationships. We determine missing relationships between the domain concepts by validating domain knowledge against real world data sources. We evaluate our approach in the healthcare domain using Electronic Medical Record(EMR) data, and demonstrate that semantic techniques can be used to semi-automate labour intensive tasks without sacrificing fidelity of domain knowledge.

    Other authors
    See publication
  • PhylOnt : A Domain-Specic Ontology for Phylogeny Analysis

    IEEE International Conference on Bioinformatics and Biomedicine

    Phylogenetic analyses can resolve historical relationships among genes, organisms or higher taxa. Understanding such relationships can elucidate a wide range of biological phenomena including the role of adaptation as a driver of diversification, the importance of gene and genome duplications in the evolution gene function, or the evolutionary consequences of biogeographic shifts. The variety of methods of analysis and data types typically employed in phylogenetic analyses can pose challenges…

    Phylogenetic analyses can resolve historical relationships among genes, organisms or higher taxa. Understanding such relationships can elucidate a wide range of biological phenomena including the role of adaptation as a driver of diversification, the importance of gene and genome duplications in the evolution gene function, or the evolutionary consequences of biogeographic shifts. The variety of methods of analysis and data types typically employed in phylogenetic analyses can pose challenges for semantic reasoning due to significant representational and computational complexity. These challenges could be ameliorated with the development of an ontology designed to capture and organize the variety of concepts used to describe phylogenetic data, methods of analysis and the results of phylogenetic analyses. In this paper, we discuss the development of PhylOnt - an ontology for phylogenetic analyses, which establishes a foundation for semantics-based workflows including meta-analyses of phylogentic data and trees. PhylOnt is an extensible ontology, which describes the methods employed to estimate trees given a data matrix, models and programs used for phylogenetic analysis and descriptions of phylogenetic trees including branch-length information and support values. The relational vocabulary included in PhylOnt will facilitate the integration of heterogeneous data types derived from both structured and unstructured sources. To illustrate the utility of PhylOnt, we annotated scientific literature to support semantic search. The semantic annotations can subsequently support workflows that requiring the exchange and integration of heterogeneous phylogenetic information.

    Panahiazar, M.; Ranabahu, A.; Taslimi, V.; Yalamanchili, H.; Stoltzfus, A.; Leebens-Mack, J.; Sheth, A.P., "PhylOnt: A domain-specific ontology for phylogeny analysis," 2012 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 4-7 Oct. 2012, pp.1-6.
    doi: 10.1109/BIBM.2012.6392677

    Other authors
    See publication
  • Alignment Based Querying of Linked Open Data

    Springer

    The Linked Open Data (LOD) cloud is rapidly becoming the largest interconnected source of structured data on diverse domains.The potential of the LOD cloud is enormous, ranging from solving challenging AI issues such as open domain question answering to automated knowledge discovery. However, due to an inherent distributed nature of LOD and a growing number of ontologies and vocabularies used in LOD datasets, querying over multiple datasets and retrieving LOD data remains a challenging task. In…

    The Linked Open Data (LOD) cloud is rapidly becoming the largest interconnected source of structured data on diverse domains.The potential of the LOD cloud is enormous, ranging from solving challenging AI issues such as open domain question answering to automated knowledge discovery. However, due to an inherent distributed nature of LOD and a growing number of ontologies and vocabularies used in LOD datasets, querying over multiple datasets and retrieving LOD data remains a challenging task. In this paper, we propose a novel approach to querying linked data by using alignments for processing queries whose constituent data come from heterogeneous sources. We also report on our Alignment based Linked Open Data Querying System (ALOQUS) and present the architecture and associated methods. Using the state of the art alignment system BLOOMS, ALOQUS automatically maps concepts in users’ SPARQL queries, written in terms of a conceptual upper ontology or domain specific ontology, to different LOD concepts and datasets. It then creates a query plan, sends sub-queries to the different endpoints, crawls for co-referent URIs, merges the results and presents them to the user. We also compare the existing querying systems and demonstrate the added capabilities that the alignment based approach can provide for querying the Linked data.

    Citation: On the Move to Meaningful Internet Systems: OTM 2012
    Lecture Notes in Computer Science Volume 7566, 2012, pp 807-824

    Other authors
    See publication
  • Harnessing Twitter ‘Big Data’ for Automatic Emotion Identification

    2012 ASE International Conference on Social Computing, SocialCom 2012

    User generated content on Twitter (produced at an enormous rate of 340 million tweets per day) provides a rich source for gleaning people's emotions, which is necessary for deeper understanding of people's behaviors and actions. Extant studies on emotion identification lack comprehensive coverage of 'emotional situations' because they use relatively small training datasets. To overcome this bottleneck, we have automatically created a large emotion-labeled dataset (of about 2.5 million tweets)…

    User generated content on Twitter (produced at an enormous rate of 340 million tweets per day) provides a rich source for gleaning people's emotions, which is necessary for deeper understanding of people's behaviors and actions. Extant studies on emotion identification lack comprehensive coverage of 'emotional situations' because they use relatively small training datasets. To overcome this bottleneck, we have automatically created a large emotion-labeled dataset (of about 2.5 million tweets) by harnessing emotion-related hashtags available in the tweets. We have applied two different machine learning algorithms for emotion identification, to study the effectiveness of various feature combinations as well as the effect of the size of the training data on the emotion identification task. Our experiments demonstrate that a combination of unigrams, bigrams, sentiment/emotion-bearing words, and parts-of-speech information is most effective for gleaning emotions. The highest accuracy (65.57%) is achieved with a training data containing about 2 million tweets.

    Other authors
    See publication
  • A Web-Based Study of Self-Treatment of Opioid Withdrawal Symptoms with Loperamide

    The College on Problems of Drug Dependence (CPDD)

    Aims: Many websites provide a medium for individuals to freely share their experiences and knowledge about different drugs. Such user-generated content can be used as a rich data source to study emerging drug use practices and trends. The study aims to examine web-based reports of loperamide use practices among non-medical opioid users. Loperamide, a piperidine derivative, is an opioid agonist approved for the control of diarrhea symptoms. Because of its general inability to cross the…

    Aims: Many websites provide a medium for individuals to freely share their experiences and knowledge about different drugs. Such user-generated content can be used as a rich data source to study emerging drug use practices and trends. The study aims to examine web-based reports of loperamide use practices among non-medical opioid users. Loperamide, a piperidine derivative, is an opioid agonist approved for the control of diarrhea symptoms. Because of its general inability to cross the blood-brain barrier, it is considered to have no abuse potential and is available without a prescription.

    Methods: A website that allows free discussion of illicit drugs and is accessible for public viewing was selected for analysis. Web-forum posts were retrieved using Web Crawlers and retained in an Informal Text Database. All unique user names were anonymized. The database was queried to extract posts with a mention of loperamide and relevant brand/slang terms.Over 1200 posts were identified and entered into NVivo to assist with consistent application of codes related to the reasons, dosage, and effects of loperamide use.

    Results: Since the first post in 2005, there was a substantial rise in discussions related to its use by non-medical opioid users, especially in 2009-2011. Loperamide was primarily discussed as a remedy to alleviate a broad range of opiate withdrawal symptoms, and was sometimes referred to as 'poor man's methadone.' Typical doses frequently ranged from 100 mg to 200 mg per day, much higher than an indicated dose of 16 mg per day.

    Conclusions: This study suggests that loperamide is being used extra-medically by people who are involved with the abuse of opioids to control withdrawal symptoms. There is a growing demand among people who are opioid dependent for drugs to control withdrawal symptoms, and loperamide appears to fit that role. The study also highlights the potential of the Web as a 'leading edge' data source in identifying emerging drug use practices.

    Other authors
    See publication
  • Finding Influential Authors in Brand-Page Communities

    6th Int'l AAAI Conference on Weblogs and Social Media (ICWSM)

    Enterprises are increasingly using social media forums to engage with their customer online- a phenomenon known as Social Customer Relation Management (Social CRM). In this context, it is important for an enterprise to identify “influential authors” and engage with them on a priority basis. We present a study towards finding influential authors on Twitter forums where an implicit network based on user interactions is created and analyzed. Furthermore, author profile features and user…

    Enterprises are increasingly using social media forums to engage with their customer online- a phenomenon known as Social Customer Relation Management (Social CRM). In this context, it is important for an enterprise to identify “influential authors” and engage with them on a priority basis. We present a study towards finding influential authors on Twitter forums where an implicit network based on user interactions is created and analyzed. Furthermore, author profile features and user interaction features are combined in a decision tree classification model for finding influential authors. A novel objective evaluation criterion is used for evaluating various features and modeling techniques. We compare our methods with other approaches that use either only the formal connections or only the author profile features and show a significant improvement in the classification accuracy over these baselines as well as over using Klout score.

    Other authors
    See publication
  • Prediction of Topic Volume on Twitter

    4th Int'l ACM Conference of Web Science (WebSci)

    [Extended Abstract] We discuss an approach for predicting microscopic (individual) and macroscopic (collective) user behavioral patterns with respect to specific trending topics on Twitter. Going beyond previous efforts that have analyzed driving factors in whether and when a user will publish topic-relevant tweets, here we seek to predict the strength of content generation which allows more accurate understanding of Twitter users’ behavior and more effective utilization of the online social…

    [Extended Abstract] We discuss an approach for predicting microscopic (individual) and macroscopic (collective) user behavioral patterns with respect to specific trending topics on Twitter. Going beyond previous efforts that have analyzed driving factors in whether and when a user will publish topic-relevant tweets, here we seek to predict the strength of content generation which allows more accurate understanding of Twitter users’ behavior and more effective utilization of the online social network for diffusing information. Unlike traditional approaches, we consider multiple dimensions into one regression-based prediction framework covering network structure, user interaction, content characteristics and past activity. Experimental results on three large Twitter datasets demonstrate the efficacy of our proposed method. We find in particular that combining features from multiple aspects (especially past activity information and network features) yields the best performance. Furthermore, we observe that leveraging more past information leads to better prediction performance, although the marginal benefit is diminishing.

    Other authors
    See publication
  • Extracting Diverse Sentiment Expressions with Target-dependent Polarity from Twitter

    In Proceedings of the 6th International AAAI Conference on Weblogs and Social Media (ICWSM)

    The problem of automatic extraction of sentiment expressions from informal text, as in microblogs such as tweets is a recent area of investigation. Compared to formal text, such as in product reviews or news articles, one of the key challenges lies in the wide diversity and informal nature of sentiment expressions that cannot be trivially enumerated or captured using predefined lexical patterns. In this work, we present an optimization-based approach to automatically extract sentiment…

    The problem of automatic extraction of sentiment expressions from informal text, as in microblogs such as tweets is a recent area of investigation. Compared to formal text, such as in product reviews or news articles, one of the key challenges lies in the wide diversity and informal nature of sentiment expressions that cannot be trivially enumerated or captured using predefined lexical patterns. In this work, we present an optimization-based approach to automatically extract sentiment expressions for a given target (e.g., movie, or person) from a corpus of unlabeled tweets. Specifically, we make three contributions: (i) we recognize a diverse and richer set of sentiment-bearing expressions in tweets, including formal and slang words/phrases, not limited to pre-specified syntactic patterns; (ii) instead of associating sentiment with an entire tweet, we assess the target-dependent polarity of each sentiment expression. The polarity of sentiment expression is determined by the nature of its target; (iii) we provide a novel formulation of assigning polarity to a sentiment expression as a constrained optimization problem over the tweet corpus. Experiments conducted on two domains, tweets mentioning movie and person entities, show that our approach improves accuracy in comparison with several baseline methods, and that the improvement becomes more prominent with increasing corpus sizes.

    Other authors
    See publication
  • Semantic Perception: Converting Sensory Observations to Abstractions

    IEEE Internet Computing, Special Issue on Context-Aware Computing: Beyond Search and Location-Based Services

    An abstraction is a representation of an environment derived from sensor observation data. Generating an abstraction requires inferring explanations from an incomplete set of observations (often from the Web) and updating these explanations on the basis of new information. This process must be fast and efficient. The authors' approach overcomes these challenges to systematically derive abstractions from observations. The approach models perception through the integration of an abductive logic…

    An abstraction is a representation of an environment derived from sensor observation data. Generating an abstraction requires inferring explanations from an incomplete set of observations (often from the Web) and updating these explanations on the basis of new information. This process must be fast and efficient. The authors' approach overcomes these challenges to systematically derive abstractions from observations. The approach models perception through the integration of an abductive logic framework called Parsimonious Covering Theory with Semantic Web technologies. The authors demonstrate this approach's utility and scalability through use cases in the healthcare and weather domains.

    Other authors
    See publication
  • Framework for the Analysis of Coordination in Crisis Response

    Collaboration & Crisis Informatics, CSCW-2012

    Social Media play a critical role during crisis events, revealing a natural coordination dynamic. We propose a computational framework guided by social science principles to measure, analyze, and understand coordination among the different types of organizations and actors in crisis response. The analysis informs both the scientific account of cooperative behavior and the design of applications and protocols to support crisis management.

    Other authors
    See publication
  • A Semantic Problem Solving Environment for Integrative Parasite Research: Identification of Intervention Targets for Trypanosoma cruzi

    PLoS Neglected Tropical Diseases

    Research on the biology of parasites requires a sophisticated and integrated computational platform to query and analyze large volumes of data, representing both unpublished (internal) and public (external) data sources. Effective analysis of an integrated data resource using knowledge discovery tools would significantly aid biologists in conducting their research, for example, through identifying various intervention targets in parasites and in deciding the future direction of ongoing as well…

    Research on the biology of parasites requires a sophisticated and integrated computational platform to query and analyze large volumes of data, representing both unpublished (internal) and public (external) data sources. Effective analysis of an integrated data resource using knowledge discovery tools would significantly aid biologists in conducting their research, for example, through identifying various intervention targets in parasites and in deciding the future direction of ongoing as well as planned projects. A key challenge in achieving this objective is the heterogeneity between the internal lab data, usually stored as flat files, Excel spreadsheets or custom-built databases, and the external databases. Reconciling the different forms of heterogeneity and effectively integrating data from disparate sources is a nontrivial task for biologists and requires a dedicated informatics infrastructure. Thus, we developed an integrated environment using Semantic Web technologies that may provide biologists the tools for managing and analyzing their data, without the need for acquiring in-depth computer science knowledge.

    Citation: Priti P. Parikh, Todd A. Minning, Vinh Nguyen, Sarasi Lalithsena, Amir H. Asiaee, Satya S. Sahoo, Prashant Doshi, Rick Tarleton, and Amit P. Sheth. 'A Semantic Problem Solving Environment for Integrative Parasite Research: Identification of Intervention Targets for Trypanosoma cruzi.' PLoS Negl Trop Dis 6(1): e1458. doi:10.1371/journal.pntd.0001458, 2012. PMID: 22272365

    Other authors
    See publication
  • An Up-to-date Knowledge-Based Literature Search and Exploration Framework for Focused Bioscience Domains

    2nd ACM SIGHIT Intl Health Informatics Symposium, IHI 2012

    First Author: Ramakanth Kavuluru

    To handle the exponential growth in bioscience literature, several knowledge-based search systems that facilitate domain-specific search have been proposed. In such systems, knowledge of a domain of interest is embedded as a backbone that guides the search process. But the knowledge used in most such systems 1. exists only for few well known broad domains; 2. is of a basic nature: either purely hierarchical or involves only few relationship types; and…

    First Author: Ramakanth Kavuluru

    To handle the exponential growth in bioscience literature, several knowledge-based search systems that facilitate domain-specific search have been proposed. In such systems, knowledge of a domain of interest is embedded as a backbone that guides the search process. But the knowledge used in most such systems 1. exists only for few well known broad domains; 2. is of a basic nature: either purely hierarchical or involves only few relationship types; and 3. is not always kept up-to-date missing insights from recently published results. In this paper we present a framework and implementation of a focused and up-to-date knowledge-based search system, called Scooner, that utilizes domain-specific knowledge extracted from recent bioscience abstracts. To our knowledge, this is the first attempt in the field to address all three shortcomings mentioned above. Since recent introduction for operational use at Applied Biotechnology Branch of AFRL, some biologists are using Scooner on a regular basis, while it is being made available for use by many more. Initial evaluations point to the promise of the approach in addressing the challenge we set out to address.

    Other authors
    See publication
  • Discovering Fine-grained Sentiment in Suicide Notes

    Journal of Biomedical Informatics Insights

    This paper presents our solution for the i2b2 sentiment classification challenge. Our hybrid system consists of machine learning and rule-based classifiers. For the machine learning classifier, we investigate a variety of lexical, syntactic and knowledge-based features, and show how much these features contribute to the performance of the classifier through experiments. For the rule-based classifier, we propose an algorithm to automatically extract effective syntactic and lexical patterns from…

    This paper presents our solution for the i2b2 sentiment classification challenge. Our hybrid system consists of machine learning and rule-based classifiers. For the machine learning classifier, we investigate a variety of lexical, syntactic and knowledge-based features, and show how much these features contribute to the performance of the classifier through experiments. For the rule-based classifier, we propose an algorithm to automatically extract effective syntactic and lexical patterns from training examples. The experimental results show that the rule-based classifier outperforms the baseline machine learning classifier using unigram features. By combining the machine learning classifier and the rule-based classifier, the hybrid system gains a better trade-off between precision and recall, and yields the highest micro-averaged F-measure (0.5038), which is better than the mean (0.4875) and median (0.5027) micro-average F-measures among all participating teams.

    Other authors
    See publication
  • An Ontological Approach to Focusing Attention and Enhancing Machine Perception on the Web

    Journal of Applied Ontology

    Today, many sensor networks and their applications employ a brute force approach to collecting and analyzing sensor data. Such an approach often wastes valuable energy and computational resources by unnecessarily tasking sensors and generating observations of minimal use. People, on the other hand, have evolved sophisticated mechanisms to efficiently perceive their environment. One such mechanism includes the use of background knowledge to determine what aspects of the environment to focus our…

    Today, many sensor networks and their applications employ a brute force approach to collecting and analyzing sensor data. Such an approach often wastes valuable energy and computational resources by unnecessarily tasking sensors and generating observations of minimal use. People, on the other hand, have evolved sophisticated mechanisms to efficiently perceive their environment. One such mechanism includes the use of background knowledge to determine what aspects of the environment to focus our attention. In this paper, we develop an ontology of perception, IntellegO, that may be used to more efficiently convert observations into perceptions. IntellegO is derived from cognitive theory, encoded in set-theory, and provides a formal semantics of machine perception. We then present an implementation that iteratively and efficiently processes low level, heterogeneous sensor data into knowledge through use of the perception ontology and domain specific background knowledge. Finally, we evaluate IntellegO by collecting and analyzing observations of weather conditions on the Web, and show significant resource savings in the generation and storage of perceptual knowledge.

    Other authors
    See publication
  • An Ontological Approach to Focusing Attention and Enhancing Machine Perception on the Web

    Journal of Applied Ontology

    Today, many sensor networks and their applications employ a brute force approach to collecting and analyzing sensor data. Such an approach often wastes valuable energy and computational resources by unnecessarily tasking sensors and generating observations of minimal use. People, on the other hand, have evolved sophisticated mechanisms to efficiently perceive their environment. One such mechanism includes the use of background knowledge to determine what aspects of the environment to focus our…

    Today, many sensor networks and their applications employ a brute force approach to collecting and analyzing sensor data. Such an approach often wastes valuable energy and computational resources by unnecessarily tasking sensors and generating observations of minimal use. People, on the other hand, have evolved sophisticated mechanisms to efficiently perceive their environment. One such mechanism includes the use of background knowledge to determine what aspects of the environment to focus our attention. In this paper, we develop an ontology of perception, IntellegO, that may be used to more efficiently convert observations into perceptions. IntellegO is derived from cognitive theory, encoded in set-theory, and provides a formal semantics of machine perception. We then present an implementation that iteratively and efficiently processes low level, heterogeneous sensor data into knowledge through use of the perception ontology and domain specific background knowledge. Finally, we evaluate IntellegO by collecting and analyzing observations of weather conditions on the Web, and show significant resource savings in the generation and storage of perceptual knowledge.

    Other authors
    See publication
  • A unified framework for managing provenance information in translational research.

    BMC Bioinformatics

    BACKGROUND:

    A critical aspect of the NIH Translational Research roadmap, which seeks to accelerate the delivery of "bench-side" discoveries to patient's "bedside," is the management of the provenance metadata that keeps track of the origin and history of data resources as they traverse the path from the bench to the bedside and back. A comprehensive provenance framework is essential for researchers to verify the quality of data, reproduce scientific results published in peer-reviewed…

    BACKGROUND:

    A critical aspect of the NIH Translational Research roadmap, which seeks to accelerate the delivery of "bench-side" discoveries to patient's "bedside," is the management of the provenance metadata that keeps track of the origin and history of data resources as they traverse the path from the bench to the bedside and back. A comprehensive provenance framework is essential for researchers to verify the quality of data, reproduce scientific results published in peer-reviewed literature, validate scientific process, and associate trust value with data and results. Traditional approaches to provenance management have focused on only partial sections of the translational research life cycle and they do not incorporate "domain semantics", which is essential to support domain-specific querying and analysis by scientists.

    RESULTS:
    We identify a common set of challenges in managing provenance information across the pre-publication and post-publication phases of data in the translational research lifecycle. We define the semantic provenance framework (SPF), underpinned by the Provenir upper-level provenance ontology, to address these challenges in the four stages of provenance metadata:(a) Provenance collection - during data generation(b) Provenance representation - to support interoperability, reasoning, and incorporate domain semantics(c) Provenance storage and propagation - to allow efficient storage and seamless propagation of provenance as the data is transferred across applications(d) Provenance query - to support queries with increasing complexity over large data size and also support knowledge discovery applications. We apply the SPF to two exemplar translational research projects, namely the Semantic Problem Solving Environment for Trypanosoma cruzi (T.cruzi SPSE) and the Biomedical Knowledge Repository (BKR) project, to demonstrate its effectiveness.

    Access: http://www.knoesis.org/library/resource.php?id=1632

    PMID: 22126369 [Highly Accessed]

    Other authors
    See publication
  • A unified framework for managing provenance information in translational research.

    BMC Bioinformatics

    BACKGROUND:
    A critical aspect of the NIH Translational Research roadmap, which seeks to accelerate the delivery of "bench-side" discoveries to patient's "bedside," is the management of the provenance metadata that keeps track of the origin and history of data resources as they traverse the path from the bench to the bedside and back. A comprehensive provenance framework is essential for researchers to verify the quality of data, reproduce scientific results published in peer-reviewed…

    BACKGROUND:
    A critical aspect of the NIH Translational Research roadmap, which seeks to accelerate the delivery of "bench-side" discoveries to patient's "bedside," is the management of the provenance metadata that keeps track of the origin and history of data resources as they traverse the path from the bench to the bedside and back. A comprehensive provenance framework is essential for researchers to verify the quality of data, reproduce scientific results published in peer-reviewed literature, validate scientific process, and associate trust value with data and results. Traditional approaches to provenance management have focused on only partial sections of the translational research life cycle and they do not incorporate "domain semantics", which is essential to support domain-specific querying and analysis by scientists.
    RESULTS:
    We identify a common set of challenges in managing provenance information across the pre-publication and post-publication phases of data in the translational research lifecycle. We define the semantic provenance framework (SPF), underpinned by the Provenir upper-level provenance ontology, to address these challenges in the four stages of provenance metadata:(a) Provenance collection - during data generation(b) Provenance representation - to support interoperability, reasoning, and incorporate domain semantics(c) Provenance storage and propagation - to allow efficient storage and seamless propagation of provenance as the data is transferred across applications(d) Provenance query - to support queries with increasing complexity over large data size and also support knowledge discovery applicationsWe apply the SPF to two exemplar translational research projects, namely the Semantic Problem Solving Environment for Trypanosoma cruzi (T.cruzi SPSE) and the Biomedical Knowledge Repository (BKR) project, to demonstrate its effectiveness.

    Cite: doi:10.1186/1471-2105-12-461. PMID: 22126369 [Highly Accessed]

    Other authors
    See publication
  • Semantic Predications for Complex Information Needs in Biomedical Literature

    International Conference on Bioinformatics and Biomedicine - BIBM 2011

    Many complex information needs that arise in biomedical disciplines require exploring multiple documents in order to obtain information. While traditional information retrieval techniques that return a single ranked list of documents are quite common for such tasks, they may not always be adequate. The main issue is that ranked lists typically impose a significant burden on users to filter out irrelevant documents. Additionally, users must intuitively reformulate their search query when…

    Many complex information needs that arise in biomedical disciplines require exploring multiple documents in order to obtain information. While traditional information retrieval techniques that return a single ranked list of documents are quite common for such tasks, they may not always be adequate. The main issue is that ranked lists typically impose a significant burden on users to filter out irrelevant documents. Additionally, users must intuitively reformulate their search query when relevant documents have not been not highly ranked. Furthermore, even after interesting documents have been selected, very few mechanisms exist that enable document- to-document transitions. In this paper, we demonstrate the utility of assertions extracted from biomedical text (called semantic predications) to facilitate retrieving relevant documents for complex information needs. Our approach offers an alternative to query reformulation by establishing a framework for transitioning from one document to another. We evaluate this novel knowledge-driven approach using precision and recall metrics on the 2006 TREC Genomics Track.

    Other authors
    See publication
  • A Domain Specific Language for Enterprise Grade Cloud-Mobile Hybrid Applications

    11th Workshop on Domain-Specific Modeling (DSM)

    Cloud computing has changed the technology landscape by
    ordering flexible and economical computing resources to the
    masses. However, vendor lock-in makes the migration of applications and data across clouds an expensive proposition.
    The lock-in is especially serious when considering the new
    technology trend of combining cloud with mobile devices.
    In this paper, we present a domain-specific language (DSL)
    that is purposely created for generating hybrid applications
    spanning…

    Cloud computing has changed the technology landscape by
    ordering flexible and economical computing resources to the
    masses. However, vendor lock-in makes the migration of applications and data across clouds an expensive proposition.
    The lock-in is especially serious when considering the new
    technology trend of combining cloud with mobile devices.
    In this paper, we present a domain-specific language (DSL)
    that is purposely created for generating hybrid applications
    spanning across mobile devices as well as computing clouds.
    We propose a model-driven development process that makes
    use of a DSL to provide sufficient programming abstractions
    over both cloud and mobile features. We describe the underlying domain modeling strategy as well as the details of
    our language and the tools supporting our approach.

    Other authors
    See publication
  • Demonstration: Real-Time Semantic Analysis of Sensor Streams

    Proceedings of the 4th International Workshop on Semantic Sensor Networks

    The emergence of dynamic information sources – including sensor networks – has led to large streams of real-time data on the Web. Research studies suggest, these dynamic networks have created more data in the last three years than in the entire history of civilization, and this trend will only increase in the coming years. With this coming data explosion, real-time analytics software must either adapt or die. This paper focuses on the task of integrating and analyzing multiple heterogeneous…

    The emergence of dynamic information sources – including sensor networks – has led to large streams of real-time data on the Web. Research studies suggest, these dynamic networks have created more data in the last three years than in the entire history of civilization, and this trend will only increase in the coming years. With this coming data explosion, real-time analytics software must either adapt or die. This paper focuses on the task of integrating and analyzing multiple heterogeneous streams of sensor data with the goal of creating meaningful abstractions, or features. These features are then temporally aggregated into feature streams. We will demonstrate an implemented framework, based on Semantic Web technologies, that creates feature streams from sensor streams in real-time, and publishes these streams as Linked Data. The generation of feature streams can be accomplished in reasonable time and results in massive data reduction.

    Other authors
    See publication
  • Semantic Annotation and Search for resources in the next GenerationWeb with SA-REST

    W3C Workshop on Data and Services Integration, October 20-21 2011, Bedford, MA, USA.

    SA-REST, the W3C member submission, can be used for supporting a wide variety of Plain Old Semantic HTML (POSH) annotation capabilities on any type of Web resource. Kino framework and tools provide support of capabilities to realize SA-REST‟s promised value. These tools include (a) a browser-plugin to support annotation of a Web resource (including services) with respect to an ontology, domain model or vocabulary, (b) an annotation aware indexing engine and (c) faceted search and selection of…

    SA-REST, the W3C member submission, can be used for supporting a wide variety of Plain Old Semantic HTML (POSH) annotation capabilities on any type of Web resource. Kino framework and tools provide support of capabilities to realize SA-REST‟s promised value. These tools include (a) a browser-plugin to support annotation of a Web resource (including services) with respect to an ontology, domain model or vocabulary, (b) an annotation aware indexing engine and (c) faceted search and selection of the Web resources. At one end of the spectrum, we present KinoE (aka Kino for Enterprise) which uses NCBO formal ontologies and associated services for searching ontologies and mappings, for annotating RESTful services and Web APIs, which are then used to support faceted search. At another end of the spectrum, we present Kino W (aka Kino for the Web), capable of adding SA-REST or Microdata annotations to Web pages, using Schema.org as a model and Linked Open Data (LOD) as a knowledge base. We also present two use cases based on KinoE and the benefits to data and service integration enabled through this annotation approach.

    Other authors
    See publication
  • Demonstration: Real-Time Semantic Analysis of Sensor Streams

    Proceedings of the 4th International Workshop on Semantic Sensor Networks

    The emergence of dynamic information sources – including sensor networks – has led to large streams of real-time data on the Web. Research studies suggest, these dynamic networks have created more data in the last three years than in the entire history of civilization, and this trend will only increase in the coming years. With this coming data explosion, real-time analytics software must either adapt or die. This paper focuses on the task of integrating and analyzing multiple heterogeneous…

    The emergence of dynamic information sources – including sensor networks – has led to large streams of real-time data on the Web. Research studies suggest, these dynamic networks have created more data in the last three years than in the entire history of civilization, and this trend will only increase in the coming years. With this coming data explosion, real-time analytics software must either adapt or die. This paper focuses on the task of integrating and analyzing multiple heterogeneous streams of sensor data with the goal of creating meaningful abstractions, or features. These features are then temporally aggregated into feature streams. We will demonstrate an implemented framework, based on Semantic Web technologies, that creates feature streams from sensor streams in real-time, and publishes these streams as Linked Data. The generation of feature streams can be accomplished in reasonable time and results in massive data reduction.

    Other authors
    See publication
  • Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter

    SoME-2011 Workshop on Social Media Engagement, in conjunction with WWW-2011

    The widespread use of social networking websites in recent years has suggested a need for effective methods to understand the new forms of user engagement, the factors impacting them, and the fundamental reasons for such engagements. We perform exploratory analysis on Twitter to understand the dynamics of user engagement by studying what attracts a user to participate in discussions on a topic. We identify various factors which might affect user engagement, ranging from content properties…

    The widespread use of social networking websites in recent years has suggested a need for effective methods to understand the new forms of user engagement, the factors impacting them, and the fundamental reasons for such engagements. We perform exploratory analysis on Twitter to understand the dynamics of user engagement by studying what attracts a user to participate in discussions on a topic. We identify various factors which might affect user engagement, ranging from content properties, network topology to user characteristics on the social network, and use them to predict user joining behavior. As opposed to traditional ways of studying them separately, these factors are organized in our framework, People-Content-Network Analysis (PCNA), mainly designed to enable understanding of human social dynamics on the web. We perform experiments on various Twitter user communities formed around topics from diverse domains, with varied social significance, duration and spread. Our findings suggest that capabilities of content, user and network features vary greatly, motivating the incorporation of all the factors in user engagement analysis, and hence, a strong need can be felt to study dynamics of user engagement by using the PCNA framework. Our study also reveals certain correlation between types of event for discussion topics and impact of user engagement factors.

    Other authors
    See publication
  • Flexible Bootstrapping-Based Ontology Alignment

    The Fifth International Workshop on Ontology Matching collocated with the 9th International Semantic Web Conference ISWC-2010, November 7, 2010

    BLOOMS (Jain et al, ISWC2010) is an ontology alignment system which, in its core, utilizes the Wikipedia category hierarchy for establishing alignments. In this paper, we present a Plug-and-Play extension to BLOOMS, which allows to flexibly replace or complement the use of Wikipedia by other online or offline resources, including domain-specific ontologies or taxonomies. By making use of automated translation services and of Wikipedia in languages other than English, it makes it possible to…

    BLOOMS (Jain et al, ISWC2010) is an ontology alignment system which, in its core, utilizes the Wikipedia category hierarchy for establishing alignments. In this paper, we present a Plug-and-Play extension to BLOOMS, which allows to flexibly replace or complement the use of Wikipedia by other online or offline resources, including domain-specific ontologies or taxonomies. By making use of automated translation services and of Wikipedia in languages other than English, it makes it possible to apply BLOOMS to alignment tasks where the input ontologies are written in different languages.

    Other authors
    See publication
  • Ontology Alignment for Linked Open Data.

    9th International Semantic Web Conference 2010 (ISWC 2010),

    The Web of Data currently coming into existence through the Linked Open Data (LOD) effort is a major milestone in realizing the Semantic Web vision. However, the development of applications based on LOD faces difficulties due to the fact that the different LOD datasets are rather loosely connected pieces of information. In particular, links between LOD datasets are almost exclusively on the level of instances, and schema-level information is being ignored. In this paper, we therefore present a…

    The Web of Data currently coming into existence through the Linked Open Data (LOD) effort is a major milestone in realizing the Semantic Web vision. However, the development of applications based on LOD faces difficulties due to the fact that the different LOD datasets are rather loosely connected pieces of information. In particular, links between LOD datasets are almost exclusively on the level of instances, and schema-level information is being ignored. In this paper, we therefore present a system for finding schema-level links between LOD datasets in the sense of ontology alignment. Our system, called BLOOMS, is based on the idea of bootstrapping information already present on the LOD cloud. We also present a comprehensive evaluation which shows that BLOOMS outperforms state-of-the-art ontology alignment systems on LOD datasets. At the same time, BLOOMS is also competitive compared with these other systems on the Ontology Evaluation Alignment Initiative Benchmark datasets.

    Other authors
    See publication
  • Towards Cloud Mobile Hybrid Application Generation using Semantically Enriched Domain Specific Languages

    International Workshop on Mobile Computing and Clouds (MobiCloud 2010)

    The advancements in computing have resulted in a boom of cheap, ubiquitous, connected mobile devices as well as seemingly unlimited, utility style, pay as you go computing resources, commonly referred to as Cloud computing. Taking advantage of this computing landscape, however, has been hampered by the many heterogeneities that exist in the mobile space as well as the Cloud space.This research attempts to introduce a disciplined methodology to develop Cloud-mobile hybrid applications by using a…

    The advancements in computing have resulted in a boom of cheap, ubiquitous, connected mobile devices as well as seemingly unlimited, utility style, pay as you go computing resources, commonly referred to as Cloud computing. Taking advantage of this computing landscape, however, has been hampered by the many heterogeneities that exist in the mobile space as well as the Cloud space.This research attempts to introduce a disciplined methodology to develop Cloud-mobile hybrid applications by using a Domain Specific Language(DSL) centric approach to generate applications. A Cloud-mobile hybrid is an application that is split between a Cloud based back-end and a mobile device based front-end. We present mobicloud, our prototype system we built based on a DSL that is capable of developing these hybrid applications. This not only reduces the learning curve but also shields the developers from the native complexities of the target platforms. We also present our vision on propelling this research forward by enriching the DSLs with semantics. The high-level vision is outline in the ambitious Cirrocumulus project, the driving principle being write once - run on any device.

    Other authors
    See publication
  • A Taxonomy-based Model for Expertise Extrapolation

    4th International Conference on Semantic Computing

    While many ExpertFinder applications succeed in finding experts, their techniques are not always designed to capture the various levels at which expertise can be expressed. Indeed, expertise can be inferred from relationships between topics and subtopics in a taxonomy. The conventional wisdom is that expertise in subtopics is also indicative of expertise in higher level topics as well. The enrichment of Expertise Profiles for finding experts can therefore be facilitated by taking domain…

    While many ExpertFinder applications succeed in finding experts, their techniques are not always designed to capture the various levels at which expertise can be expressed. Indeed, expertise can be inferred from relationships between topics and subtopics in a taxonomy. The conventional wisdom is that expertise in subtopics is also indicative of expertise in higher level topics as well. The enrichment of Expertise Profiles for finding experts can therefore be facilitated by taking domain hierarchies into account. We present a novel semantics-based model for finding experts, expertise levels and collaboration levels in a peer review context, such as composing a Program Committee (PC) for a conference. The implicit coauthorship network encompassed by bibliographic data enables the possibility of discovering unknown experts within various degrees of separation in the coauthorship graph. Our results show an average of 85% recall in finding experts, when evaluated against three WWW Conference PCs and close to 80 additional comparable experts outside the immediate collaboration network of the PC Chairs.

    Other authors
    See publication
  • Provenance Context Entity (PaCE): Scalable Provenance Tracking for Scientific RDF Data

    Proceeedings of 22nd International Scientific and Statistical Database Management Conference

    The Resource Description Framework (RDF) format is being used by a large number of scientific applications to store and disseminate their datasets. The provenance information, describing the source or lineage of the datasets, is playing an increasingly significant role in ensuring data quality, computing trust value of the datasets, and ranking query results. Current provenance tracking approaches using the RDF reification vocabulary suffer from a number of known issues, including lack of…

    The Resource Description Framework (RDF) format is being used by a large number of scientific applications to store and disseminate their datasets. The provenance information, describing the source or lineage of the datasets, is playing an increasingly significant role in ensuring data quality, computing trust value of the datasets, and ranking query results. Current provenance tracking approaches using the RDF reification vocabulary suffer from a number of known issues, including lack of formal semantics, use of blank nodes, and application-dependent interpretation of reified RDF triples. In this paper, we introduce a new approach called Provenance Context Entity (PaCE) that uses the notion of provenance context to create provenance-aware RDF triples. We also define the formal semantics of PaCE through a simple extension of the existing RDF(S) semantics that ensures compatibility of PaCE with existing Semantic Web tools and implementations. We have implemented the PaCE approach in the Biomedical Knowledge Repository (BKR) project at the US National Library of Medicine. The evaluations demonstrate a minimum of 49% reduction in total number of provenance-specific RDF triples generated using the PaCE approach as compared to RDF reification. In addition, performance for complex queries improves by three orders of magnitude and remains comparable to the RDF reification approach for simpler provenance queries.

    Access paper at: http://knoesis.org/library/resource.php?id=797

    Other authors
    See publication
  • Provenance Aware Linked Sensor Data

    2nd Workshop on Trust and Privacy on the Social and Semantic Web

    Provenance, from the French word 'provenir', describes the lineage or history of a data entity. Provenance is critical information in the sensors domain to identify a sensor and analyze the observation data over time and geographical space. In this paper, we present a framework to model and query the provenance information associated with the sensor data exposed as part of the Web of Data using the Linked Open Data conventions. This is accomplished by developing an ontology-driven provenance…

    Provenance, from the French word 'provenir', describes the lineage or history of a data entity. Provenance is critical information in the sensors domain to identify a sensor and analyze the observation data over time and geographical space. In this paper, we present a framework to model and query the provenance information associated with the sensor data exposed as part of the Web of Data using the Linked Open Data conventions. This is accomplished by developing an ontology-driven provenance management infrastructure that includes a representation model and query infrastructure. This provenance infrastructure, called Sensor Provenance Management System (PMS), is underpinned by a domain specific provenance ontology called Sensor Provenance (SP) ontology. The SP ontology extends the Provenir upper level provenance ontology to model domain-specific provenance in the sensor domain. In this paper, we describe the implementation of the Sensor PMS for provenance tracking in the Linked Sensor Data.

    Other authors
    See publication
  • Contextual Ontology Alignment of LOD with an Upper Ontology: A Case Study with Proton.

    Springer/LNCS

    The Linked Open Data (LOD) is a major milestone towards realizing the Semantic Web vision, and can enable applications such as robust Question Answering (QA) systems that can answer queries requiring multiple, disparate information sources. However, realizing these applications requires relationships at both the schema and instance level, but currently the LOD only provides relationships for the latter. To address this limitation, we present a solution for automatically finding schema-level…

    The Linked Open Data (LOD) is a major milestone towards realizing the Semantic Web vision, and can enable applications such as robust Question Answering (QA) systems that can answer queries requiring multiple, disparate information sources. However, realizing these applications requires relationships at both the schema and instance level, but currently the LOD only provides relationships for the latter. To address this limitation, we present a solution for automatically finding schema-level links between two LOD ontologies -- in the sense of ontology alignment. Our solution, called BLOOMS+, extends our previous solution (i.e. BLOOMS) in two significant ways. BLOOMS+ 1) uses a more sophisticated metric to determine which classes between two ontologies to align, and 2) considers contextual information to further support (or reject) an alignment. We present a comprehensive evaluation of our solution using schema-level mappings from LOD ontologies to Proton (an upper level ontology) -- created manually by human experts for a real world application called FactForge. We show that our solution performed well on this task. We also show that our solution significantly outperformed existing ontology alignment solutions (including our previously published work on BLOOMS) on this same task.

    Other authors
    See publication
  • Linked Sensor Data

    Proceedings of 2010 International Symposium on Collaborative Technologies and Systems (CTS 2010)

    A number of government, corporate, and academic organizations are collecting enormous amounts of data provided by environmental sensors. However, this data is too often locked within organizations and underutilized by the greater community. In this paper, we present a framework to make this sensor data openly accessible by publishing it on the Linked Open Data (LOD) Cloud. This is accomplished by converting raw sensor observations to RDF and linking with other datasets on LOD. With such a…

    A number of government, corporate, and academic organizations are collecting enormous amounts of data provided by environmental sensors. However, this data is too often locked within organizations and underutilized by the greater community. In this paper, we present a framework to make this sensor data openly accessible by publishing it on the Linked Open Data (LOD) Cloud. This is accomplished by converting raw sensor observations to RDF and linking with other datasets on LOD. With such a framework, organizations can make large amounts of sensor data openly accessible, thus allowing greater opportunity for utilization and analysis.

    Other authors
    See publication
  • Linked Sensor Data

    Proceedings of 2010 International Symposium on Collaborative Technologies and Systems (CTS 2010)

    A number of government, corporate, and academic organizations are collecting enormous amounts of data provided by environmental sensors. However, this data is too often locked within organizations and underutilized by the greater community. In this paper, we present a framework to make this sensor data openly accessible by publishing it on the Linked Open Data (LOD) Cloud. This is accomplished by converting raw sensor observations to RDF and linking with other datasets on LOD. With such a…

    A number of government, corporate, and academic organizations are collecting enormous amounts of data provided by environmental sensors. However, this data is too often locked within organizations and underutilized by the greater community. In this paper, we present a framework to make this sensor data openly accessible by publishing it on the Linked Open Data (LOD) Cloud. This is accomplished by converting raw sensor observations to RDF and linking with other datasets on LOD. With such a framework, organizations can make large amounts of sensor data openly accessible, thus allowing greater opportunity for utilization and analysis.

    Other authors
    See publication
  • A Qualitative Examination of Topical Tweet and Retweet Practices

    4th Int'l AAAI Conference on Weblogs and Social Media (ICWSM)

    This work contributes to the study of retweet behavior on Twitter surrounding real-world events. We analyze over a million tweets pertaining to three events, present general tweet properties in such topical datasets and qualitatively analyze the properties of the retweet behavior surrounding the most tweeted/viral content pieces. Findings include a clear relationship between sparse/dense retweet patterns and the content and type of a tweet itself; suggesting the need to study content properties…

    This work contributes to the study of retweet behavior on Twitter surrounding real-world events. We analyze over a million tweets pertaining to three events, present general tweet properties in such topical datasets and qualitatively analyze the properties of the retweet behavior surrounding the most tweeted/viral content pieces. Findings include a clear relationship between sparse/dense retweet patterns and the content and type of a tweet itself; suggesting the need to study content properties in link-based diffusion models.

    Other authors
    See publication
  • Pattern-Based Synonym and Antonym Extraction

    ACM Southeast Conference 2010, ACMSE2010

    Many research studies adopt manually selected patterns for semantic relation extraction. However, manually identifying and discovering patterns is time consuming and it is difficult to discover all potential candidates. Instead, we propose an automatic pattern construction approach to extract verb synonyms and antonyms from English newspapers. Instead of relying on a single pattern, we combine results indicated by multiple patterns to maximize the recall.

    Other authors
    See publication
  • What Goes Around Comes Around – Improving Linked Opend Data through On-Demand Model Creation

    Web Science Conference 2010 - WebSci10

    First Author: Christopher Thomas

    We present a method for growing the amount of knowledge available on the Web using a hermeneutic method that involves background knowledge, Information Extraction techniques and validation through discourse and use of the extracted information. We exemplify this using Linked Data as background knowledge, automatic Model/Ontology creation for the IE part and a Semantic Browser for evaluation. The hermeneutic approach, however, is open to be used with…

    First Author: Christopher Thomas

    We present a method for growing the amount of knowledge available on the Web using a hermeneutic method that involves background knowledge, Information Extraction techniques and validation through discourse and use of the extracted information. We exemplify this using Linked Data as background knowledge, automatic Model/Ontology creation for the IE part and a Semantic Browser for evaluation. The hermeneutic approach, however, is open to be used with other IE techniques and other evaluation methods. We will present results from the model creation and anecdotal evidence for the feasibility of 'Validation through Use'.

    Other authors
    See publication
  • Linked Data Is Merely More Data

    AAAI Spring Symposium

    In this position paper, we argue that the Linked Open Data (LoD) Cloud, in its current form, is only of limited value for furthering the Semantic Web vision. Being merely a weakly linked 'triple collection', it will only be of very limited benefit for the AI or Semantic Web communities. We describe the corresponding problems with the LoD Cloud and give directions for research to remedy the situation.

    Other authors
    See publication
  • SPARQL Query Re-writing for Spatial Datasets Using Partonomy Based Transformation Rules

    Third International Conference on Geospatial Semantics (GeoS 2009)

    Often the information present in a spatial knowledge base is represented at a different level of granularity and abstraction than the query constraints. For querying ontology’s containing spatial information, the precise relationships between spatial entities has to be specified in the basic graph pattern of SPARQL query which can result in long and complex queries. We present a novel approach to help users intuitively write SPARQL queries to query spatial data, rather than relying on…

    Often the information present in a spatial knowledge base is represented at a different level of granularity and abstraction than the query constraints. For querying ontology’s containing spatial information, the precise relationships between spatial entities has to be specified in the basic graph pattern of SPARQL query which can result in long and complex queries. We present a novel approach to help users intuitively write SPARQL queries to query spatial data, rather than relying on knowledge of the ontology structure. Our framework re-writes queries, using transformation rules to exploit part-whole relations between geographical entities to address the mismatches between query constraints and knowledge base. Our experiments were performed on completely third party datasets and queries. Evaluations were performed on Geonames dataset using questions from National Geographic Bee serialized into SPARQL and British Administrative Geography Ontology using questions from a popular trivia website. These experiments demonstrate high precision in retrieval of results and ease in writing queries.

    Other authors
    See publication
  • Sensor Discovery on Linked Data

    Technical Report

    There has been a drive recently to make sensor data accessible on the Web. However, because of the vast number of sensors collecting data about our environment, finding relevant sensors on the Web is a non-trivial challenge. In this paper, we present an approach to discovering sensors through a standard service interface over Linked Data. This is accomplished with a semantic sensor network middleware that includes a sensor registry on Linked Data and a sensor discovery service that extends the…

    There has been a drive recently to make sensor data accessible on the Web. However, because of the vast number of sensors collecting data about our environment, finding relevant sensors on the Web is a non-trivial challenge. In this paper, we present an approach to discovering sensors through a standard service interface over Linked Data. This is accomplished with a semantic sensor network middleware that includes a sensor registry on Linked Data and a sensor discovery service that extends the OGC Sensor Web Enablement. With this approach, we are able to access and discover sensors that are positioned near named-locations of interest.

    Other authors
    See publication
  • Twitris: Socially Influenced Browsing

    Semantic Web Challenge, International Semantic Web Conference 2009

    First Author: Ashutosh Jadhav

    In this paper, we present Twitris, a semantic Web application that facilitates browsing for news and information, using social perceptions as the fulcrum. In doing so we address challenges in large scale crawling, processing of real time information, and preserving spatiotemporal-thematic properties central to observations pertaining to realtime events. We extract metadata about events from Twitter and bring related news and Wikipedia articles to…

    First Author: Ashutosh Jadhav

    In this paper, we present Twitris, a semantic Web application that facilitates browsing for news and information, using social perceptions as the fulcrum. In doing so we address challenges in large scale crawling, processing of real time information, and preserving spatiotemporal-thematic properties central to observations pertaining to realtime events. We extract metadata about events from Twitter and bring related news and Wikipedia articles to the user. In developing Twitris,we have used the DBPedia ontology.

    Other authors
    See publication
  • Spatio-Temporal-Thematic Analysis of Citizen Sensor Data: Challenges and Experiences

    Web Information Systems Engineering

    We present work in the spatio-temporal-thematic analysis of citizen-sensor observations pertaining to real-world events. Using Twitter as a platform for obtaining crowd-sourced observations, we explore the interplay between these 3 dimensions in extracting insightful summaries of social perceptions behind events. We present our experiences in building a web mashup application, Twitris (http://twitris.knoesis.org) that extracts and facilitates the spatio-temporal-thematic exploration of event…

    We present work in the spatio-temporal-thematic analysis of citizen-sensor observations pertaining to real-world events. Using Twitter as a platform for obtaining crowd-sourced observations, we explore the interplay between these 3 dimensions in extracting insightful summaries of social perceptions behind events. We present our experiences in building a web mashup application, Twitris (http://twitris.knoesis.org) that extracts and facilitates the spatio-temporal-thematic exploration of event descriptor summaries.

    Other authors
    See publication
  • SemSOS: Semantic Sensor Observation Service

    International Symposium on Collaborative Technologies and Systems

    Sensor Observation Service (SOS) is a Web service specification defined by the Open Geospatial Consortium (OGC) Sensor Web Enablement (SWE) group in order to standardize the way sensors and sensor data are discovered and accessed on the Web. This standard goes a long way in providing interoperability between repositories of heterogeneous sensor data and applications that use this data. Many of these applications, however, are ill equipped at handling raw sensor data as provided by SOS and…

    Sensor Observation Service (SOS) is a Web service specification defined by the Open Geospatial Consortium (OGC) Sensor Web Enablement (SWE) group in order to standardize the way sensors and sensor data are discovered and accessed on the Web. This standard goes a long way in providing interoperability between repositories of heterogeneous sensor data and applications that use this data. Many of these applications, however, are ill equipped at handling raw sensor data as provided by SOS and require actionable knowledge of the environment in order to be practically useful. There are two approaches to deal with this obstacle, make the applications smarter or make the data smarter. We propose the latter option and accomplish this by leveraging semantic technologies in order to provide and apply more meaningful representation of sensor data. More specifically, we are modeling the domain of sensors and sensor observations in a suite of ontologies, adding semantic annotations to the sensor data, using the ontology models to reason over sensor observations, and extending an open source SOS implementation with our semantic knowledge base. This semantically enabled SOS, or SemSOS, provides the ability to query high-level knowledge of the environment as well as low-level raw sensor data.

    Other authors
    See publication
  • Service level agreement in cloud computing

    OOPSLA

    Cloud computing that provides cheap and pay-as-you-go computing resources is rapidly gaining momentum as an alternative to traditional IT Infrastructure. As more and more consumers delegate their tasks to cloud providers, Service Level Agreements(SLA) between consumers and providers emerge as a key aspect. Due to the dynamic nature of the cloud, continuous monitoring on Quality of Service (QoS) attributes is necessary to enforce SLAs. Also numerous other factors such as trust (on the cloud…

    Cloud computing that provides cheap and pay-as-you-go computing resources is rapidly gaining momentum as an alternative to traditional IT Infrastructure. As more and more consumers delegate their tasks to cloud providers, Service Level Agreements(SLA) between consumers and providers emerge as a key aspect. Due to the dynamic nature of the cloud, continuous monitoring on Quality of Service (QoS) attributes is necessary to enforce SLAs. Also numerous other factors such as trust (on the cloud provider) come into consideration, particularly for enterprise customers that may outsource its critical data. This complex nature of the cloud landscape warrants a sophisticated means of managing SLAs. This paper proposes a mechanism for managing SLAs in a cloud computing environment using the Web Service Level Agreement (WSLA) framework, developed for SLA monitoring and SLA enforcement in a Service Oriented Architecture (SOA). We use the third
    party support feature of WSLA to delegate monitoring and enforcement tasks to other entities in order to solve the trust issues. We also present a real world use case to validate our proposal.

    Other authors
    See publication
  • A Faceted Classification Based Approach to Search and Rank Web APIsKarthik Gomadam, Ajith Ranabahu, Meenakshi Nagarajan, Amit P. Sheth, Kunal Verma: A Faceted Classification Based Approach to Search and Rank Web APIs. ICWS 2008: 177-184

    International Conference on Web Services

    Web application hybrids, popularly known as mashups,
    are created by integrating services on the Web using their
    APIs. Support for finding an API is currently provided by
    generic search engines or domain specific solutions such
    as ... Shortcomings of both these solutions in terms of and
    reliance on user tags make the task of identifying an API
    challenging. Since these APIs are described in HTML documents,
    it is essential to look beyond the boundaries of current
    approaches…

    Web application hybrids, popularly known as mashups,
    are created by integrating services on the Web using their
    APIs. Support for finding an API is currently provided by
    generic search engines or domain specific solutions such
    as ... Shortcomings of both these solutions in terms of and
    reliance on user tags make the task of identifying an API
    challenging. Since these APIs are described in HTML documents,
    it is essential to look beyond the boundaries of current
    approaches to Web service discovery that rely on formal
    descriptions. In this work, we present a faceted approach
    to searching and ranking Web APIs that takes into
    consideration attributes or facets of the APIs as found in
    their HTML descriptions. Our method adopts current research
    in document classification and faceted search and
    introduces the serviut score to rank APIs based on their utilization
    and popularity. We evaluate classification, search
    accuracy and ranking effectiveness using available APIs
    while contrasting our solution with existing ones.

    Other authors
    See publication
  • Mediatability: Estimating the Degree of Human Involvement in XML Schema Mediation

    International Conference on Semantic Computing

    Mediation and integration of data are significant challenges because the number of services on the Web, and heterogeneities in their data representation, continue to increase rapidly. To address these challenges we introduce a new measure, mediatability, which is a quantifiable and computable metric for the degree of human involvement in XML schema mediation. We present an efficient algorithm to compute mediatability and an experimental study to analyze how semantic annotations affect the ease…

    Mediation and integration of data are significant challenges because the number of services on the Web, and heterogeneities in their data representation, continue to increase rapidly. To address these challenges we introduce a new measure, mediatability, which is a quantifiable and computable metric for the degree of human involvement in XML schema mediation. We present an efficient algorithm to compute mediatability and an experimental study to analyze how semantic annotations affect the ease of mediating between two schemas. We validate our approach by comparing mediatability scores generated by our system with user-perceived difficulty. We also evaluate the scalability of our system on alarge number of exisiting APIs.

    Other authors
    See publication
  • Semantic Provenance for eScience: Managing the Deluge of Scientific Data

    IEEE Internet Computing

    Provenance information in eScience is metadata that's critical to effectively manage the exponentially increasing volumes of scientific data from industrial-scale experiment protocols. Semantic provenance, based on domain-specific provenance ontologies, lets software applications unambiguously interpret data in the correct context. The semantic provenance framework for eScience data comprises expressive provenance information and domain-specific provenance ontologies and applies this…

    Provenance information in eScience is metadata that's critical to effectively manage the exponentially increasing volumes of scientific data from industrial-scale experiment protocols. Semantic provenance, based on domain-specific provenance ontologies, lets software applications unambiguously interpret data in the correct context. The semantic provenance framework for eScience data comprises expressive provenance information and domain-specific provenance ontologies and applies this information to data management. The authors' "two degrees of separation" approach advocates the creation of high-quality provenance information using specialized services. In contrast to workflow engines generating provenance information as a core functionality, the specialized provenance services are integrated into a scientific workflow on demand. This article describes an implementation of the semantic provenance framework for glycoproteomics.

    Satya S. Sahoo,Amit Sheth, and Cory Henson, 'Semantic Provenance for eScience: Managing the Deluge of Scientific Data', IEEE Internet Computing, vol. 12, no. 4, 2008, pp. 46-54.

    Access: http://www.knoesis.org/library/resource.php?id=00310

    Other authors
    See publication
  • Semantic Sensor Web

    IEEE Internet Computing

    In recent years, sensors have been increasingly adopted by a diverse array of disciplines, such as meteorology for weather forecasting and wildfire detection, civic planning for traffic management, satellite imaging for earth and space observation, medical sciences for patient care using biometric sensors, and homeland security for radiation and biochemical detection at ports. Sensors are thus distributed across the globe, leading to an avalanche of data about our environment. The rapid…

    In recent years, sensors have been increasingly adopted by a diverse array of disciplines, such as meteorology for weather forecasting and wildfire detection, civic planning for traffic management, satellite imaging for earth and space observation, medical sciences for patient care using biometric sensors, and homeland security for radiation and biochemical detection at ports. Sensors are thus distributed across the globe, leading to an avalanche of data about our environment. The rapid development and deployment of sensor technology involves many different types of sensors, both remote and in situ, with diverse capabilities such as range, modality, and maneuverability. Today, it's possible to use sensor networks to detect and identify a multitude of observations, from simple phenomena to complex events and situations. The lack of integration and communication between these networks, however, often isolates important data streams and intensifies the existing problem of too much data and not enough knowledge. With a view to addressing this problem, we discuss a semantic sensor Web (SSW) in which sensor data is annotated with semantic metadata to increase interoperability as well as provide contextual information essential for situational knowledge. In particular, this involves annotating sensor data with spatial, temporal, and thematic semantic metadata.

    Cite: Amit Sheth, Cory Henson, and Satya Sahoo, 'Semantic Sensor Web,' IEEE Internet Computing, 14 (1), July/August 2008, p. 78-83.

    Access: http://knoesis.wright.edu/library/resource.php?id=00311

    Other authors
    See publication
  • Semantic Sensor Web

    IEEE Internet Computing

    In recent years, sensors have been increasingly adopted by a diverse array of disciplines, such as meteorology for weather forecasting and wildfire detection, civic planning for traffic management, satellite imaging for earth and space observation, medical sciences for patient care using biometric sensors, and homeland security for radiation and biochemical detection at ports. Sensors are thus distributed across the globe, leading to an avalanche of data about our environment. The rapid…

    In recent years, sensors have been increasingly adopted by a diverse array of disciplines, such as meteorology for weather forecasting and wildfire detection, civic planning for traffic management, satellite imaging for earth and space observation, medical sciences for patient care using biometric sensors, and homeland security for radiation and biochemical detection at ports. Sensors are thus distributed across the globe, leading to an avalanche of data about our environment. The rapid development and deployment of sensor technology involves many different types of sensors, both remote and in situ, with diverse capabilities such as range, modality, and maneuverability. Today, it's possible to use sensor networks to detect and identify a multitude of observations, from simple phenomena to complex events and situations. The lack of integration and communication between these networks, however, often isolates important data streams and intensifies the existing problem of too much data and not enough knowledge. With a view to addressing this problem, we discuss a semantic sensor Web (SSW) in which sensor data is annotated with semantic metadata to increase interoperability as well as provide contextual information essential for situational knowledge. In particular, this involves annotating sensor data with spatial, temporal, and thematic semantic metadata.

    Other authors
    See publication
  • An ontology-driven semantic mashup of gene and biological pathway information: application to the domain of nicotine dependence.

    Journal of Biomedical Informatics

    OBJECTIVES:

    This paper illustrates how Semantic Web technologies (especially RDF, OWL, and SPARQL) can support information integration and make it easy to create semantic mashups (semantically integrated resources). In the context of understanding the genetic basis of nicotine dependence, we integrate gene and pathway information and show how three complex biological queries can be answered by the integrated knowledge base.

    METHODS:

    We use an ontology-driven approach to…

    OBJECTIVES:

    This paper illustrates how Semantic Web technologies (especially RDF, OWL, and SPARQL) can support information integration and make it easy to create semantic mashups (semantically integrated resources). In the context of understanding the genetic basis of nicotine dependence, we integrate gene and pathway information and show how three complex biological queries can be answered by the integrated knowledge base.

    METHODS:

    We use an ontology-driven approach to integrate two gene resources (Entrez Gene and HomoloGene) and three pathway resources (KEGG, Reactome and BioCyc), for five organisms, including humans. We created the Entrez Knowledge Model (EKoM), an information model in OWL for the gene resources, and integrated it with the extant BioPAX ontology designed for pathway resources. The integrated schema is populated with data from the pathway resources, publicly available in BioPAX-compatible format, and gene resources for which a population procedure was created. The SPARQL query language is used to formulate queries over the integrated knowledge base to answer the three biological queries.

    RESULTS:

    Simple SPARQL queries could easily identify hub genes, i.e., those genes whose gene products participate in many pathways or interact with many other gene products. The identification of the genes expressed in the brain turned out to be more difficult, due to the lack of a common identification scheme for proteins.

    CONCLUSION:

    Semantic Web technologies provide a valid framework for information integration in the life sciences. Ontology-driven integration represents a flexible, sustainable and extensible solution to the integration of large volumes of information. Additional resources, which enable the creation of mappings between information sources, are required to compensate for heterogeneity across namespaces.

    PMID: 18395495

    Access:
    http://www.knoesis.org/library/resource.php?id=00221

    Other authors
    See publication
  • A Semantic Framework for Identifying Events in a Service Oriented Architecture.

    International Conference on Web Services

    We propose a semantic framework for automatically
    identifying events as a step towards developing an adaptive
    middleware for Service Oriented Architecture (SOA).
    Current related research focuses on adapting to events that
    violate certain non-functional objectives of the service requestor.
    Given the large of number of events that can happen
    during the execution of a service, identifying events that
    can impact the non-functional objectives of a service request
    is a key…

    We propose a semantic framework for automatically
    identifying events as a step towards developing an adaptive
    middleware for Service Oriented Architecture (SOA).
    Current related research focuses on adapting to events that
    violate certain non-functional objectives of the service requestor.
    Given the large of number of events that can happen
    during the execution of a service, identifying events that
    can impact the non-functional objectives of a service request
    is a key challenge. To address this problem we propose
    an approach that allows service requestors to create
    semantically rich service requirement descriptions, called
    semantic templates. We propose a formal model for expressing
    semantic templates and for measuring the relevance of
    an event to both the action being performed and the nonfunctional
    objectives. This model is extended to adjust the
    relevance of the events based on feedback from the underlying
    adaptation framework. We present an algorithm that
    utilizes multiple ontologies for identifying relevant events
    and present our evaluations that measure the efficiency of
    both the event identification and the subsequent adaptation
    scheme.

    Other authors
    See publication
  • Automatic Composition of Semantic Web Services Using Process Mediation

    9th International Conference on Enterprise Information Systems (ICES 2007), Funchal, Portugal, June 12–16, 2007, pp. 453–461.

    Web service composition has quickly become a key area of research in the services oriented architecture community. One of the challenges in composition is the existence of heterogeneities across independently created and autonomously managed Web service requesters and Web service providers. Previous work in this area
    either involved significant human effort or in cases of the efforts seeking to provide largely automated approaches, overlooked the problem of data heterogeneities, resulting in…

    Web service composition has quickly become a key area of research in the services oriented architecture community. One of the challenges in composition is the existence of heterogeneities across independently created and autonomously managed Web service requesters and Web service providers. Previous work in this area
    either involved significant human effort or in cases of the efforts seeking to provide largely automated approaches, overlooked the problem of data heterogeneities, resulting in partial solutions that would not support executable workflow for real-world problems. In this paper, we present a planning-based approach to solve
    both the process heterogeneity and data heterogeneity problems. Our system successfully outputs a BPEL file which correctly solves a non-trivial real-world problem in the 2006 SWS Challenge.

    Full citation:
    Wu, Zixin,Karthik Gomadam, Ajith Ranabahu, Amit P. Sheth, and John A. Miller, “Automatic Composition of Semantic Web Services using Process Mediation,”in Proceedings of the 9th International Conference on Enterprise Information Systems (ICES 2007), Funchal, Portugal, June 12–16, 2007, pp. 453–461.

    Other authors
    See publication
  • From "glycosyltransferase" to "congenital muscular dystrophy": integrating knowledge from NCBI Entrez Gene and the Gene Ontology.

    MedInfo (Stud Health Technol Inform.)

    Entrez Gene (EG), Online Mendelian Inheritance in Man (OMIM) and the Gene Ontology (GO) are three complementary knowledge resources that can be used to correlate genomic data with disease information. However, bridging between genotype and phenotype through these resources currently requires manual effort or the development of customized software. In this paper, we argue that integrating EG and GO provides a robust and flexible solution to this problem. We demonstrate how the Resource…

    Entrez Gene (EG), Online Mendelian Inheritance in Man (OMIM) and the Gene Ontology (GO) are three complementary knowledge resources that can be used to correlate genomic data with disease information. However, bridging between genotype and phenotype through these resources currently requires manual effort or the development of customized software. In this paper, we argue that integrating EG and GO provides a robust and flexible solution to this problem. We demonstrate how the Resource Description Framework (RDF) developed for the Semantic Web can be used to represent and integrate these resources and enable seamless access to them as a unified resource. We illustrate the effectiveness of our approach by answering a real-world biomedical query linking a specific molecular function, glycosyltransferase, to the disorder congenital muscular dystrophy.

    Citation: Satya S. Sahoo, Kelly Zeng, Olivier Bodenreider, and Amit P. Sheth, 'From 'Glycosyltransferase'to 'Congenital Muscular Dystrophy': Integrating Knowledge from NCBI Entrez Gene and the Gene Ontology,' in MEDINFO 2007: Proceedings of the 12th World Congress on Health (Medical) Informatics, K.A. Kuhn, J.R. Warren, T.-Y. Leong (Eds.), Studies in Health Technology and Informatics, Vol. 129, Amsterdam: IOS, August 2007, pp. 1260-04. PMID: 17911917

    Access: http://www.knoesis.org/library/resource.php?id=00014

    Other authors
    See publication
  • Knowledge modeling and its application in life sciences: a tale of two ontologies

    Proceedings of the 15th international conference on World Wide Web

    High throughput glycoproteomics, similar to genomics and proteomics, involves extremely large volumes of distributed, heterogeneous data as a basis for identification and quantification of a structurally diverse collection of biomolecules. The ability to share, compare, query for and most critically correlate datasets using the native biological relationships are some of the challenges being faced by glycobiology researchers. As a solution for these challenges, we are building a semantic…

    High throughput glycoproteomics, similar to genomics and proteomics, involves extremely large volumes of distributed, heterogeneous data as a basis for identification and quantification of a structurally diverse collection of biomolecules. The ability to share, compare, query for and most critically correlate datasets using the native biological relationships are some of the challenges being faced by glycobiology researchers. As a solution for these challenges, we are building a semantic structure, using a suite of ontologies, which supports management of data and information at each step of the experimental lifecycle. This framework will enable researchers to leverage the large scale of glycoproteomics data to their benefit.In this paper, we focus on the design of these biological ontology schemas with an emphasis on relationships between biological concepts, on the use of novel approaches to populate these complex ontologies including integrating extremely large datasets ( 500MB) as part of the instance base and on the evaluation of ontologies using OntoQA metrics. The application of these ontologies in providing informatics solutions, for high throughput glycoproteomics experimental domain, is also discussed. We present our experience as a use case of developing two ontologies in one domain, to be part of a set of use cases, which are used in the development of an emergent framework for building and deploying biological ontologies.

    Cite: Satya S. Sahoo, Christopher Thomas, Amit P. Sheth, William York, and Samir Tartir, 'Knowledge Modeling and Its Application in Life Sciences: A Tale of Two Ontologies,'15th International World Wide Web Conference (WWW2006), Edinburgh, Scotland, May 23-26, 2006.

    Access: http://www.knoesis.org/library/resource.php?id=00020

    Other authors
    See publication
  • METEOR–S WSDI: A Scalable Infrastructure of Registries for Semantic Publication and Discovery of Web Services

    Journal of Information Technology and Management

    Verma K., Sivashanmugam K., Sheth A., Patil A., Oundhakar S. and Miller J., “METEOR–S WSDI: A Scalable Infrastructure of Registries for Semantic Publication and Discovery of Web Services”, Journal of Information Technology and Management, Special Issue on Universal Global Integration, Vol. 6, No. 1 (2005) pp. 17-39. Kluwer Academic Publishers

    Other authors
    See publication
  • Composing semantic web services with interaction protocols

    Technical report, LSDIS Lab, University of Georgia, Athens, Georgia (2006)

    Web service composition has quickly become an important area ofresearch in the services oriented architecture community. One of the challengesin composition is the existence of heterogeneities between independentlycreated and autonomously managed Web service requesters and Web serviceproviders. This paper focuses on the problem of composing Web services inthe presence of ordering constraints on their operations imposed by the serviceproviders. We refer to the ordering constraints on an services…

    Web service composition has quickly become an important area ofresearch in the services oriented architecture community. One of the challengesin composition is the existence of heterogeneities between independentlycreated and autonomously managed Web service requesters and Web serviceproviders. This paper focuses on the problem of composing Web services inthe presence of ordering constraints on their operations imposed by the serviceproviders. We refer to the ordering constraints on an services operations asinteraction protocol We present a novel approach to composition involvingwhat we term as pseudo operations to expressively capture the serviceprovider’s interaction protocol. Pseudo operations are used to resolveheterogeneities by constructing a plan of services in a more intelligent andefficient manner. They accomplish this by utilizing descriptive humanknowledge from the service provider and capture this knowledge as part of aplanning problem to create more flexible and expressive Web serviceoperations that may be made available to service requesters. We use acustomer-retailer scenario to show that this method alleviates planningcomplexities and generates more robust Web service compositions. Empiricaltesting was performed using this scenario and compared to existing methods toshow the improvement attributable to our method.

    Other authors
    See publication
  • GLYDE-an expressive XML standard for the representation of glycan structure.

    Journal of Carbohydrate Research

    The amount of glycomics data being generated is rapidly increasing as a result of improvements in analytical and computational methods. Correlation and analysis of this large, distributed data set requires an extensible and flexible representational standard that is also 'understood' by a wide range of software applications. An XML-based data representation standard that faithfully captures essential structural details of a glycan moiety along with additional information (such as data…

    The amount of glycomics data being generated is rapidly increasing as a result of improvements in analytical and computational methods. Correlation and analysis of this large, distributed data set requires an extensible and flexible representational standard that is also 'understood' by a wide range of software applications. An XML-based data representation standard that faithfully captures essential structural details of a glycan moiety along with additional information (such as data provenance) to aid the interpretation and usage of glycan data, will facilitate the exchange of glycomics data across the scientific community. To meet this need, we introduce GLYcan Data Exchange (GLYDE) standard as an XML-based representation format to enable interoperability and exchange of glycomics data. An online tool () for the conversion of other representations to GLYDE format has been developed.

    Citation: Satya S. Sahoo, Christopher Thomas, Amit P. Sheth, Cory Henson, and William S. York, 'GLYDE-An expressive XML standard for the representation of glycan structure,'Carbohydr Res, 340 (no. 18), December 30, 2005. pp. 2802-2807. Epub 2005 Oct 20. PMID: 16242678.

    Access: http://www.knoesis.org/library/resource.php?id=00018

    Other authors
    See publication
  • GLYDE-an expressive XML standard for the representation of glycan structure.

    Journal of Carbohydrate Research

    The amount of glycomics data being generated is rapidly increasing as a result of improvements in analytical and computational methods. Correlation and analysis of this large, distributed data set requires an extensible and flexible representational standard that is also 'understood' by a wide range of software applications. An XML-based data representation standard that faithfully captures essential structural details of a glycan moiety along with additional information (such as data…

    The amount of glycomics data being generated is rapidly increasing as a result of improvements in analytical and computational methods. Correlation and analysis of this large, distributed data set requires an extensible and flexible representational standard that is also 'understood' by a wide range of software applications. An XML-based data representation standard that faithfully captures essential structural details of a glycan moiety along with additional information (such as data provenance) to aid the interpretation and usage of glycan data, will facilitate the exchange of glycomics data across the scientific community. To meet this need, we introduce GLYcan Data Exchange (GLYDE) standard as an XML-based representation format to enable interoperability and exchange of glycomics data. An online tool for the conversion of other representations to GLYDE format has been developed.

    Keywords: GLYcan Data Exchange (GLYDE), Glycan data interoperability, XML-based glycan representation, Glycoinformatics

    Cite: Satya S. Sahoo, Christopher Thomas, Amit P. Sheth, Cory Henson, and William S. York, 'GLYDE-An expressive XML standard for the representation of glycan structure,'Carbohydr Res, 340 (no. 18), December 30, 2005. pp. 2802-2807. Epub 2005 Oct 20. PMID: 16242678.

    Access: http://www.knoesis.org/library/resource.php?id=00018

    Other authors
    See publication
  • METEOR–S WSDI: A Scalable Infrastructure of Registries for Semantic Publication and Discovery of Web Services

    Journal of Information Technology and Management

    Kunal Verma, Kaarthik Sivashanmugam,Amit Sheth, Abhijit Patil, Swapna Oundhakar, andJohn Miller, 'METEOR-S WSDI: A Scalable P2P Infrastructure of Registries for Semantic Publication and Discovery of Web Services,'Information Technology and Management 6 (no. 1), January 2005, pp. 17-39.

    Web services are the new paradigm for distributed computing. They have much to offer towards interoperability of applications and integration of large scale distributed systems. To make Web services…

    Kunal Verma, Kaarthik Sivashanmugam,Amit Sheth, Abhijit Patil, Swapna Oundhakar, andJohn Miller, 'METEOR-S WSDI: A Scalable P2P Infrastructure of Registries for Semantic Publication and Discovery of Web Services,'Information Technology and Management 6 (no. 1), January 2005, pp. 17-39.

    Web services are the new paradigm for distributed computing. They have much to offer towards interoperability of applications and integration of large scale distributed systems. To make Web services accessible to users, service providers use Web service registries to publish them. Current infrastructure of registries requires replication of all Web service publications in all Universal Business Registries (UBR) which provide text and taxonomy based search capabilities. Large growth in number of Web services as well as the growth in the number of registries would make this replication impractical. In addition, the current Web service discovery mechanism is inefficients. Semantic discovery or matching of services is a promising approach to address this challenge. In this paper, we present a scalable, high performance environment for federated Web service publication and discovery among multiple registries. This work uses an ontology-based approach to organize registries, enabling semantic classification of all Web services based on domains. Each of these registries supports semantic publication of the Web services, which is used during discovery process. We have implemented two algorithms each for semantic publication and one algorithm for semantic discovery of Web services. We believe that the semantic approach suggested in this paper will significantly improve Web services publication and discovery involving a large number of registries. As a part of the METEOR-S project, we have leveraged the peer-to-peer networking as a scalable infrastructure for registries that can support automated and semi-automated Web service publication and discovery.

    Other authors
    See publication
  • METEOR-S Web Service Annotation Framework (MWSAF)

    Proceedings of Thirteenth International World Wide Web Conference

    Patil. A., Oundhakar S., Sheth A., and Verma K., “METEOR-S Web Service Annotation Framework (MWSAF)”, Proceedings of Thirteenth International World Wide Web Conference, May 2004 (WWW2004), pp. 553-562

    Other authors
    See publication
  • The METEOR-S Approach for Configuring and Executing Dynamic Web Processes

    technical report, LSDIS Technical Report 05-001, 2005.

    Web processes are the next generation workflows created using Web services. This paper addresses research issues in creating a framework for configuring and executing dynamic Web processes. The configuration module uses Semantic Web
    service discovery, integer linear programming and logic based constraint satisfaction to configure the process, based on quantitative and non-quantitative
    process constraints. Semantic representation of Web services and process constraints are used to…

    Web processes are the next generation workflows created using Web services. This paper addresses research issues in creating a framework for configuring and executing dynamic Web processes. The configuration module uses Semantic Web
    service discovery, integer linear programming and logic based constraint satisfaction to configure the process, based on quantitative and non-quantitative
    process constraints. Semantic representation of Web services and process constraints are used to achieve dynamic configuration. An execution environment is
    presented, which can handle heterogeneities at the protocol and data level by using proxies with data and protocol mediation capabilities. In cases of Web
    service failures, we present an approach to reconfigure the process at run-time, without violating the process constraints. Empirical testing of the execution environment is performed to compare deployment-time and run-time binding.

    Other authors
    See publication
  • Transactions in Transactional Workflows

    Advanced Transaction Models and Architectures, S. Jajodia and L. Kerschberg (Eds.), Kluwer Academic Publishers, 1997, pp. 3-34.

    Workflow management systems (WFMSs) are finding wide applicability in small and large organizational settings. Advanced transaction models (ATMs) focus on maintaining data consistency and have provided solutions to many problems such as correctness, consistency, and reliability in transaction processing and database management environments. While such concepts have yet to be solved in the domain of workflow systems, database researchers have proposed to use, or attempted to use ATMs to model…

    Workflow management systems (WFMSs) are finding wide applicability in small and large organizational settings. Advanced transaction models (ATMs) focus on maintaining data consistency and have provided solutions to many problems such as correctness, consistency, and reliability in transaction processing and database management environments. While such concepts have yet to be solved in the domain of workflow systems, database researchers have proposed to use, or attempted to use ATMs to model workflows. In this paper we survey the work done in the area of transactional workflow systems. We then argue that workflow requirements in large-scale enterprise-wide applications involving heterogeneous and distributed environments either differ or exceed the modeling and functionality support provided by ATMs. We propose that an ATM is unlikely to provide the primary basis for modeling of workflow applications, and subsequently workflow management. We discuss a framework for error handling and recovery in the METEOR WFMS that borrows from relevant work in ATMs, distributed systems, software engineering, and organizational sciences. We have also presented various connotations of transactions in real-world organizational processes today. Finally, we point out the need for looking beyond ATMs and using a multi-disciplinary approach for modeling large-scale workflow applications of the future.

    Other authors
    See publication
  • Task Scheduling Using Intertask Dependencies in Carnot

    ACM SIGMOD Intl. Conf. On the Management of Data

    The Carnot Project at MCC is addressing the problem of logically unifying physically-distributed, enterprise-wide, heterogeneous information. Carnot will provide a user with the means to navigate information efficiently and transparently, to update that information consistently, and to write applications easily for large, heterogeneous, distributed information systems. A prototype has been implemented which provides services for (a) enterprise modeling and model integration to create an…

    The Carnot Project at MCC is addressing the problem of logically unifying physically-distributed, enterprise-wide, heterogeneous information. Carnot will provide a user with the means to navigate information efficiently and transparently, to update that information consistently, and to write applications easily for large, heterogeneous, distributed information systems. A prototype has been implemented which provides services for (a) enterprise modeling and model integration to create an enterprise-wide view, (b) semantic expansion of queries on the view to queries on individual resources, and (c) inter-resource consistency management. This paper describes the Carnot approach to transaction processing in environments where heterogeneous, distributed, and autonomous systems are required to coordinate the update of the local information under their control. In this approach, subtransactions are represented as a set of tasks and a set of intertask dependencies that capture the semantics of a particular relaxed transaction model. A scheduler has been implemented which schedules the execution of these tasks in the Carnot environment so that all intertask dependencies are satisfied.

    Other authors
    See publication
  • Entity Recommendations Using Hierarchical Knowledge Bases

    -

    Recent developments in recommendation algorithms have focused on integrating Linked Open Data to augment traditional algorithms with background knowledge. These developments recognize that the integration of Linked Open Data may offer better performance, particularly in cold start cases. In this paper, we explore if and how a specific type of Linked Open Data, namely hierarchical knowledge, may be utilized for recommendation systems. We propose a content-based recommendation approaches…

    Recent developments in recommendation algorithms have focused on integrating Linked Open Data to augment traditional algorithms with background knowledge. These developments recognize that the integration of Linked Open Data may offer better performance, particularly in cold start cases. In this paper, we explore if and how a specific type of Linked Open Data, namely hierarchical knowledge, may be utilized for recommendation systems. We propose a content-based recommendation approaches that adapts a spreading activation algorithm over
    the DBpedia category structure to identify entities of interest to the user. Evaluation of the algorithm over the Movielens dataset demonstrates that our method yields more accurate recommendations compared to a previously proposed taxonomy driven approach for recommendations.

    Other authors
  • Moving beyond sameAs with PLATO: Partonomy detection for Linked Data

    ACM, New York, NY, USA.

    The Linked Open Data (LOD) Cloud has gained significant traction over the past few years. With over 275 interlinked datasets across diverse domains such as life science, geography, politics, and more, the LOD Cloud has the potential to support a variety of applications ranging from open domain question answering to drug discovery.

    Despite its significant size (approx. 30 billion triples), the data is relatively sparely interlinked (approx. 400 million links). A semantically richer LOD…

    The Linked Open Data (LOD) Cloud has gained significant traction over the past few years. With over 275 interlinked datasets across diverse domains such as life science, geography, politics, and more, the LOD Cloud has the potential to support a variety of applications ranging from open domain question answering to drug discovery.

    Despite its significant size (approx. 30 billion triples), the data is relatively sparely interlinked (approx. 400 million links). A semantically richer LOD Cloud is needed to fully realize its potential. Data in the LOD Cloud are currently interlinked mainly via the owl:sameAs property, which is inadequate for many applications. Additional properties capturing relations based on causality or partonomy are needed to enable the answering of complex questions and to support applications.

    In this paper, we present a solution to enrich the LOD Cloud by automatically detecting partonomic relationships, which are well-established, fundamental properties grounded in linguistics and philosophy. We empirically evaluate our solution across several domains, and show that our approach performs well on detecting partonomic properties between LOD Cloud data.

Patents

  • Methods and systems for analysis of real-time user-generated text messages

    Issued US US20120042022 A1

    The present invention generally relates to methods and systems for analysis of real-time user-generated text messages. The methods and systems allow analysis to be performed using term associations and geographical and temporal constraints.

    Other inventors
    See patent
  • System and method for creating a Semantic Web and its applications in Browsing, Searching, Profiling, Personalization and Advertising

    Issued US 6311194

    A system and method for creating a database of metadata (metabase) of a variety of digital media content, including TV and radio content delivered on Internet. This semantic-based method captures and enhances domain or subject specific metadata of digital media content, including the specific meaning and intended use of original content. To support semantics, a WorldModel is provided that includes specific domain knowledge, ontologies as well as a set of rules relevant to the original content…

    A system and method for creating a database of metadata (metabase) of a variety of digital media content, including TV and radio content delivered on Internet. This semantic-based method captures and enhances domain or subject specific metadata of digital media content, including the specific meaning and intended use of original content. To support semantics, a WorldModel is provided that includes specific domain knowledge, ontologies as well as a set of rules relevant to the original content. The metabase may also be dynamic in that it may track changes to the any variety of accessible content, including live and archival TV and radio programming.

    WorldModel = Ontology

    Other inventors
    See patent
  • Method and system for providing uniform access to heterogeneous information

    Issued US WO1997015018 A1

    Our invention is a system and methodology for integrating heterogeneous information in a distributed environment by encapsulating data about existing and new information into objects (16). The process of encapsulating the information requires extracting from the information metadata. Creating from the metadata, a database (30), where the metadata is grouped into objects (26) and groups of objects (28) which are logically associated into collections (28). This database of object and collections…

    Our invention is a system and methodology for integrating heterogeneous information in a distributed environment by encapsulating data about existing and new information into objects (16). The process of encapsulating the information requires extracting from the information metadata. Creating from the metadata, a database (30), where the metadata is grouped into objects (26) and groups of objects (28) which are logically associated into collections (28). This database of object and collections is instantiated into runtime memory of a server (22), organized into repositories (24) of objects (20) and collections (28). A user (12) seeking access to the information would then, using an HTTP compliant browser (20), access the server (22) to access the information through the objects (26) created and stored in the server.

    Other inventors
    See patent
  • Method for enforcing the serialization of global multidatabase transactions through committing only on consistent subtransaction serialization by the local database managers

    Issued US 5241675

    Our invention guarantees global serializability by preventing multidatabase transactions from being serialized in different ways at the participating local database systems (LDBS). In one embodiment tickets are used to inform the MDBS of the relative serialization order of the subtransactions of each global transactions at each LDBS. A ticket is a (logical) timestamp whose value is stored as a regular data item in each LDBS. Each substransaction of a global transaction is required to issue the…

    Our invention guarantees global serializability by preventing multidatabase transactions from being serialized in different ways at the participating local database systems (LDBS). In one embodiment tickets are used to inform the MDBS of the relative serialization order of the subtransactions of each global transactions at each LDBS. A ticket is a (logical) timestamp whose value is stored as a regular data item in each LDBS. Each substransaction of a global transaction is required to issue the take-a-ticket operations which consists of reading the value of the ticket (i.e., read ticket) and incrementing it (i.e., write (ticket+1)) through regular data manipulation operations. Only the subtransactions of global transactions take tickets. When different global transactions issue subtransactions at a local database, each subtransaction will include the take-a-ticket operations. Therefore, the ticket values associated with each global subtransaction at the MDBS reflect the local serialization order at each LDBS. The MDBS in accordance with our invention examines the ticket values to determine the local serialization order at the different LDBS's and only authorizes the transactions to commit if the serialization order of the global transactions is the same at each LDBS. In another embodiment, the LDBSs employ rigorous schedulers and the prepared-to-commit messages for each subtransaction are used by the MDBS to ensure global serializability.

    Other inventors
    See patent
  • Topic-specific sentiment extraction

    Filed US US20140358523 A1

    One or more embodiments of techniques or systems for sentiment extraction are provided herein. From a corpus or group of social media data which includes one or more expressions pertaining to a topic, target topic, or a target, one or more candidate expressions may be extracted. Relationships between one or more pairs of candidate expressions may be identified or evaluated. For example, a consistency relationship or an inconsistency relationship between a pair may be determined. A root word…

    One or more embodiments of techniques or systems for sentiment extraction are provided herein. From a corpus or group of social media data which includes one or more expressions pertaining to a topic, target topic, or a target, one or more candidate expressions may be extracted. Relationships between one or more pairs of candidate expressions may be identified or evaluated. For example, a consistency relationship or an inconsistency relationship between a pair may be determined. A root word database may include one or more root words which facilitate identification of candidate expressions. Among one or more of the root words may be seed words, which may be associated with a predetermined polarity. To this end, polarities may be determined based on a formulation which assigns polarities to a sentiment expression, candidate expressions, or an expression as a constrained optimization problem.

    Other inventors
    See patent

Projects

  • MTSS AI Concierge- Custom, Compact and NeuroSymbolic AI Model

    This is a part of one of several projects related to mental health. More at: https://wiki.aiisc.ai/index.php?title=Mental_Health_Projects

    Other creators
  • Nourish Co-pilot: Custom, Compact and NeuroSymbolic Diet AI Model

    Personalizing recipes to suit one’s needs is challenging. It involves studying the multifaceted context of a recipe, such as the nutrition of the ingredient, health label of the ingredient (vegan, lactose-free), suitability of an ingredient to a health condition (potato has high GI compared to broccoli), formation of harmful compounds due to the impact of cooking methods on ingredients and nutrition retention of ingredients after cooking. To solve these challenges, we propose a custom, compact,…

    Personalizing recipes to suit one’s needs is challenging. It involves studying the multifaceted context of a recipe, such as the nutrition of the ingredient, health label of the ingredient (vegan, lactose-free), suitability of an ingredient to a health condition (potato has high GI compared to broccoli), formation of harmful compounds due to the impact of cooking methods on ingredients and nutrition retention of ingredients after cooking. To solve these challenges, we propose a custom, compact, on-demand neurosymbolic model powered by a co-pilot that can analyze and recommend recipes to individuals with diabetes

    Other creators
  • SmartPilot: A Custom-Compact NeuroSymbolic Co-Pilot for Next-Gen Manufacturing

    This is part of our work on Smart Manufacturing: https://wiki.aiisc.ai/index.php?title=Smart_manufacturing

    Other creators
  • Context-Aware Harassment Detection on Social Media

    - Present

    As social media permeates our daily life, there has been a sharp rise in the use of social media to humiliate, bully, and threaten others, which has come with harmful consequences such as emotional distress, depression, and suicide. The October 2014 Pew Research survey shows that 73% of adult Internet users have observed online harassment and 40% have experienced it. Most of those who have experienced online harassment, 66% said their most recent incident occurred on a social networking site or…

    As social media permeates our daily life, there has been a sharp rise in the use of social media to humiliate, bully, and threaten others, which has come with harmful consequences such as emotional distress, depression, and suicide. The October 2014 Pew Research survey shows that 73% of adult Internet users have observed online harassment and 40% have experienced it. Most of those who have experienced online harassment, 66% said their most recent incident occurred on a social networking site or app. Further, 25% of teens claim to have been cyberbullied. The prevalence and serious consequences of online harassment present both social and technological challenges.

    Existing work on harassment detection usually applies machine learning for binary classification, relying on message content while ignoring message context. Harassment is a pragmatic phenomenon, necessarily context-sensitive. We identify three dimensions of context for social media, people, content, and network, for the harassment phenomenon. Focusing on content, but ignoring either people (offender and victim) or network (social networks of offender and victim) yields misleading results. An apparent "bullying conversation" between good friends with sarcastic content presents no serious threat, while the same content from an identifiable stranger may function as harassment. Content analysis alone cannot capture these subtle but important distinctions.

    Social science research identifies some of the necessary harassment components and features typically ignored in the existing binary harassment-or-not computation: (1) aggressive/offensive language, (2) potentially harmful consequences to emotion, such as distress and psychological trauma, and (3) a deliberate intent to harm. This research reshapes social media harassment detection as a multi-dimensional analysis of the degree to which harassment occurs.

    Other creators
    See project
  • Hazards SEES: Social and Physical Sensing Enabled Decision Support for Disaster Management and Response

    - Present

    Infrastructure systems are a cornerstone of civilization. Damage to infrastructure from natural disasters such as an earthquake (e.g. Haiti, Japan), a hurricane (e.g. Katrina, Sandy) or a flood (e.g. Kashmir floods) can lead to significant economic loss and societal suffering. Human coordination and information exchange are at the center of damage control. This project seeks to radically reform decision support systems for managing rapidly changing disaster situations by the integrated…

    Infrastructure systems are a cornerstone of civilization. Damage to infrastructure from natural disasters such as an earthquake (e.g. Haiti, Japan), a hurricane (e.g. Katrina, Sandy) or a flood (e.g. Kashmir floods) can lead to significant economic loss and societal suffering. Human coordination and information exchange are at the center of damage control. This project seeks to radically reform decision support systems for managing rapidly changing disaster situations by the integrated exploitation of social, physical and hazard modeling capabilities.

    The team is designing novel, multi-dimensional cross-modal aggregation and inference methods to compensate for the uneven coverage of sensing modalities across an affected region. By assimilating data from social and physical sensors and their integrated modeling and analysis, methodology to predict and help prioritize the temporally and conceptually extended consequences of damage to people, civil infrastructure (transportation, power, waterways) and their components (e.g. bridges, traffic signals) will be designed. The team will develop innovative technology to support the identification of new background knowledge and structured data to improve object extraction, location identification, correlation or integration of relevant data across multiple sources and modalities (social, physical and Web). Novel coupling of socio-linguistic and network analysis will be used to identify important persons and objects, statistical and factual knowledge about traffic and transportation networks, and their impact on hazard models (e.g. storm surge) and flood mapping. Domain-grounded mechanisms will be developed to address pervasive trustworthiness and reliability concerns.

    Other creators
    See project
  • Project Safe Neighborhood

    - Present

    Project Safe Neighborhood: Westwood Partnership to Prevent Juvenile Repeat Offenders is an interdisciplinary project involving the Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) – Wright State University with other community partners including the City of Dayton (Dayton Police Department), Montgomery County Juvenile Justice and University of Dayton to prevent juvenile repeat offenders from committing crime in the Westwood neighborhood located in the City of Dayton…

    Project Safe Neighborhood: Westwood Partnership to Prevent Juvenile Repeat Offenders is an interdisciplinary project involving the Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) – Wright State University with other community partners including the City of Dayton (Dayton Police Department), Montgomery County Juvenile Justice and University of Dayton to prevent juvenile repeat offenders from committing crime in the Westwood neighborhood located in the City of Dayton, Ohio.

    Objectives of this project include:

    * Research and develop the criteria for identifying the most at risk youth
    * Establish the best practices for bringing all resources to common focus for these youth
    * Provided evidence-based strategies to address the pattern of crime in Westwood neighborhood and measure effectiveness of those strategies by a number of methods, including the use of social media.
    * Increase the use of law enforcement home visits in the targeted neighborhood
    * Enhance both the services and the sanctions made available through juvenile justice system

    Other creators
    See project
  • Recommendations Using Hierarchical Knowledge Bases

    - Present

    Personalization and recommendations is the focus of today's commercial systems to increase user engagement in the era of Big Data. Efficient user interests identification stands pivotal to the success the content based recommendation systems. In this work, we explore a crowd-sourced structured knowledge base - Wikipedia - through an adaptation of spreading activation theory to identify interesting concepts to a user. We then rank the entities through the interest hierarchy extracted from the…

    Personalization and recommendations is the focus of today's commercial systems to increase user engagement in the era of Big Data. Efficient user interests identification stands pivotal to the success the content based recommendation systems. In this work, we explore a crowd-sourced structured knowledge base - Wikipedia - through an adaptation of spreading activation theory to identify interesting concepts to a user. We then rank the entities through the interest hierarchy extracted from the knowledge base. In our evaluation of movie recommendations, we observed that our approach performs better compared to other relevant systems and addresses the data sparsity problem.

    Other creators
  • NIDA National Early Warning System Network (iN3)

    - Present

    To accelerate the response to emerging drug abuse trends, this NIH-funded study (9/15/14 – 9/14/15) is designed to establish iN3, an innovative NIDA National Early Warning System Network that will rapidly identify, evaluate, and disseminate information on emerging drug use patterns. Two synergistic data streams will be used to identify emerging patterns of drug use. The first data stream will be derived from the Toxicology Investigators Consortium (“ToxIC”), a network of medical toxicologists…

    To accelerate the response to emerging drug abuse trends, this NIH-funded study (9/15/14 – 9/14/15) is designed to establish iN3, an innovative NIDA National Early Warning System Network that will rapidly identify, evaluate, and disseminate information on emerging drug use patterns. Two synergistic data streams will be used to identify emerging patterns of drug use. The first data stream will be derived from the Toxicology Investigators Consortium (“ToxIC”), a network of medical toxicologists who specialize in recognizing and confirming sentinel events involving psychoactive substances. ToxIC investigators are located at 42 sites across the U.S, and of these, we have selected 11 to serve as sentinel surveillance sites. The research team will analyze reports from ToxIC investigators’ assessments of patients with acute, subacute, and chronic effects of emerging drug use. The second involves measures of drug use derived from social media (Twitter feeds and web forums).

    Other creators
    See project
  • FACested Entity Summarization - FACES

    - Present

    We explore three dimensions in creating entity summaries (in knowledge bases and graphs): uniqueness, popularity, and diversity.

    Other creators
    See project
  • FACeted Entity Summarization - FACES

    - Present

    We explore three dimensions in creating entity summaries (in knowledge bases and graphs): uniqueness, popularity, and diversity.

    Other creators
    See project
  • Trending: Social media analysis to monitor cannabis and synthetic cannabinoid use (eDrugTrends)

    The ultimate goal of this proposal is to decrease the burden of psychoactive substance use in the United States. Building on a longstanding multidisciplinary collaboration between researchers at the Center for Interventions, Treatment, and Addictions Research (CITAR) and the Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) at Wright State University, we are developing and deploying an innovative software platform, eDrugTrends, capable of semi-automated processing of social…

    The ultimate goal of this proposal is to decrease the burden of psychoactive substance use in the United States. Building on a longstanding multidisciplinary collaboration between researchers at the Center for Interventions, Treatment, and Addictions Research (CITAR) and the Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) at Wright State University, we are developing and deploying an innovative software platform, eDrugTrends, capable of semi-automated processing of social media data to identify emerging trends in cannabis and synthetic cannabinoid use.

    Cannabis remains one of the most commonly used psychoactive substances in the U.S., and current epidemiological studies indicate broadening acceptability. Over the past several years, synthetic cannabinoids (“synthetics,” such as Spice, K2) have emerged as new designer drugs. Synthetics, after gaining popularity as “legal” alternatives to cannabis, have been associated with adverse health effects such as seizures and changes in mental status requiring ICU admission. In the context of profound changes in cannabis legalization policies that are taking place across the U.S., close epidemiological monitoring of natural and synthetic cannabinoid products is needed to assess the impact of policy changes and identify emerging issues and trends.

    Specific Aims:

    Develop a comprehensive software platform, eDrugTrends, for semi-automated processing and visualization of thematic, sentiment, spatio-temporal, and social network dimensions of social media data (Twitter and Web forums) on cannabis and synthetic cannabinoid use.

    -- Identify and compare trends in knowledge, attitudes, and behaviors related to cannabis and synthetic cannabinoid use across U.S. regions with different cannabis legalization policies using Twitter and Web forum data.
    -- Analyze social network characteristics and identify key influencers (opinions leaders) in cannabis and synthetic cannabinoid-related discussions on Twitter.

    Other creators
    See project
  • Automated Clinical Document Improvement

    - Present

    Quality of patient health record documentation is critical for individuals, hospitals, insurance companies and Compliance/Regulatory agencies for reasons such as reimbursements, fraud detection and
    clinical research. CDI(Clinical Document Improvement) personnel play a vital role in assuring document quality standards through a well established querying process for clarification in the documents. In an ongoing collaboration with ezDI (http://ezdi.us), our collaborator and sponsor, we are…

    Quality of patient health record documentation is critical for individuals, hospitals, insurance companies and Compliance/Regulatory agencies for reasons such as reimbursements, fraud detection and
    clinical research. CDI(Clinical Document Improvement) personnel play a vital role in assuring document quality standards through a well established querying process for clarification in the documents. In an ongoing collaboration with ezDI (http://ezdi.us), our collaborator and sponsor, we are experimenting the techniques from data mining and statistics to help automate the process of identifying the discrepancies in the clinical data documentation. We combine the domains knowledge base and machine learning techniques to find importance of various types of attributes, and to learn the relationships among them. These attributes and relationships enable us to find the missing concepts or documented discrepancies in a clinical chart.

    Other creators
  • Location Prediction of Twitter Users

    The geographic location of a Twitter user can be used in many applications such as Personalization and Recommendation systems. This work explores the use of an external knowledge-base (Wikipedia) to predict the location of a Twitter user based on the contents of their tweets and compares this approach to the existing statistical approaches. The key contribution of this work is that it does not require a training data set of geo-tagged tweets as used by the state-of-the-art approaches.

    Other creators
    See project
  • Mining Societal Attitudes and Beliefs about Gender-based Violence (GBV)

    - Present

    With SMEs from UNFPA, we have launched this initiative for data-driven insights to curb #GenderViolence via assessing role of #BigData in policy interventions and anti-GBV campaigns.
    Motivation: Humanitarian and public institutions are increasingly facing problem of mining Big Data from social media sites to measure public attitude, and enable timely public engagement. Such engagement supports the exploration of public views on important social issues such as GBV. We are studying Big…

    With SMEs from UNFPA, we have launched this initiative for data-driven insights to curb #GenderViolence via assessing role of #BigData in policy interventions and anti-GBV campaigns.
    Motivation: Humanitarian and public institutions are increasingly facing problem of mining Big Data from social media sites to measure public attitude, and enable timely public engagement. Such engagement supports the exploration of public views on important social issues such as GBV. We are studying Big (Social) Data to analyze public opinion, attitudes and beliefs regarding GBV, highlighting the nature of online content posting practices by geographical location and gender. The exploitation of Big Data requires the techniques of Computational Social Science to mine insight from the corpus while accounting for the influence of both transient events and sociocultural factors. This research has implications to reveal public awareness regarding GBV tolerance and suggest opportunities for intervention and the measurement of intervention effectiveness assisting both governmental and non-governmental organizations in policy development.

    Other creators
    See project
  • Mining Societal Attitudes and Beliefs about Gender-based Violence (GBV)

    - Present

    With SMEs from UNFPA, we have launched this initiative for data-driven insights to curb #GenderViolence via assessing role of #BigData in policy interventions and anti-GBV campaigns.
    Motivation: Humanitarian and public institutions are increasingly facing problem of mining Big Data from social media sites to measure public attitude, and enable timely public engagement. Such engagement supports the exploration of public views on important social issues such as GBV. We are studying Big…

    With SMEs from UNFPA, we have launched this initiative for data-driven insights to curb #GenderViolence via assessing role of #BigData in policy interventions and anti-GBV campaigns.
    Motivation: Humanitarian and public institutions are increasingly facing problem of mining Big Data from social media sites to measure public attitude, and enable timely public engagement. Such engagement supports the exploration of public views on important social issues such as GBV. We are studying Big (Social) Data to analyze public opinion, attitudes and beliefs regarding GBV, highlighting the nature of online content posting practices by geographical location and gender. The exploitation of Big Data requires the techniques of Computational Social Science to mine insight from the corpus while accounting for the influence of both transient events and sociocultural factors. This research has implications to reveal public awareness regarding GBV tolerance and suggest opportunities for intervention and the measurement of intervention effectiveness assisting both governmental and non-governmental organizations in policy development.

    Other creators
    See project
  • kHealth - Knowledge-enabled Healthcare

    - Present

    kHealth – Knowledge-enabled Healthcare is a platform which integrates data from passive and active sensing (including both machine and human sensors) with background knowledge from domain ontologies, semantic reasoning, and mobile computing environments to help people make decisions to improve health, fitness, and wellbeing. kHealth utilizes technology from Semantic Sensor Web, Semantic Perception, and Intelligence at the Interface to enable advanced healthcare applications. So far we have…

    kHealth – Knowledge-enabled Healthcare is a platform which integrates data from passive and active sensing (including both machine and human sensors) with background knowledge from domain ontologies, semantic reasoning, and mobile computing environments to help people make decisions to improve health, fitness, and wellbeing. kHealth utilizes technology from Semantic Sensor Web, Semantic Perception, and Intelligence at the Interface to enable advanced healthcare applications. So far we have developed sensor-mobile app kit for the following clinical applications:

    - control and predict risk for asthma in children (with Dayton's Children's Hospital)
    http://wiki.knoesis.org/index.php/Asthma [NIH funded R01 project starting July 2016]
    - to predict risk for adverse event for dementia patients (collaboration: Dept. of Geriatrics, WSU Boonshoft School of Medicine) http://wiki.knoesis.org/index.php/Dementia
    - to reduce preventable hospital readmissions of patients with Acute Decompensated Heart Failure (collaboration: OSU Wexner Medical Center)

    Evaluation under clinical supervision and approved IRBs are on-going in asthma and dementia.

    Personalized Digital Health Research and Applications at Kno.e.sis: http://youtu.be/mATRAQ90wio

    Other creators
    See project
  • Obvio

    - Present

    Obvio (spanish for obvious) is the name of the project on semantics-based techniques for Literature-Based Discovery (LBD) using Biomedical Literature. The goal of Obvio is to uncover hidden connections between concepts in text, thereby leading to hypothesis generation from publicly available scientific knowledge sources.
    It utilizes Semantic predications (assertions extracted from biomedical literature) for Literature-Based Discovery (LBD).

    Other creators
    See project
  • Continuous Semantics and Realt-time Analysis of Social and Sensor Data

    - Present

    We’ve made significant progress in applying semantics and Semantic Web technologies in a range of domains. A relatively well-understood approach to reaping semantics’ benefits begins with formal modeling of a domain’s concepts and relationships, typically as an ontology. Then, we extract relevant facts — in the form of related entities — from the corpus of background knowledge and use them to populate the ontology. Finally, we apply the ontology to extract semantic metadata or to semantically…

    We’ve made significant progress in applying semantics and Semantic Web technologies in a range of domains. A relatively well-understood approach to reaping semantics’ benefits begins with formal modeling of a domain’s concepts and relationships, typically as an ontology. Then, we extract relevant facts — in the form of related entities — from the corpus of background knowledge and use them to populate the ontology. Finally, we apply the ontology to extract semantic metadata or to semantically annotate data in unseen or new corpora. Using annotations yields semanticsenhanced experiences for search, browsing, integration, personalization, advertising, analysis, discovery, situational awareness, and so on.This typically works well for domains that involve slowly evolving knowledge concentrated among deeply specialized domain experts and that have definable boundaries. However, this approach has difficulties dealing with dynamic domains involved in social, mobile, and sensor webs. This project looks at how continuous semantics can help us model those domains and analyze the related real-time data typically found on social, mobile, and sensor webs, that exhibit five characteristics. First, they’re spontaneous (arising suddenly). Second, they follow a period of rapid evolution, involving real-time or near real-time data, which requires continuous searching and analysis. Third, they involve many distributed participants with fragmented and opinionated information. Fourth, they accommodate diverse viewpoints involving topical or contentious subjects. Finally, they feature context colored by local knowledge as well as perceptions based on different observations and their sociocultural analysis.

    Other creators
    See project
  • IntellegO - Semantic Perception Technology

    Currently, there are many sensors collecting information about our environment, leading to an overwhelming number of observations that must be analyzed and explained in order to achieve situation awareness. As perceptual beings, we are also constantly inundated with sensory data; yet we are able to make sense out of our environment with relative ease. Semantic Perception is a computational framework, inspired by cognitive models of human perception, to derive actionable intelligence and…

    Currently, there are many sensors collecting information about our environment, leading to an overwhelming number of observations that must be analyzed and explained in order to achieve situation awareness. As perceptual beings, we are also constantly inundated with sensory data; yet we are able to make sense out of our environment with relative ease. Semantic Perception is a computational framework, inspired by cognitive models of human perception, to derive actionable intelligence and situational awareness from low-level sensor data. The formalization of this ability utilizes prior knowledge encoded in domain ontologies, and hybrid abductive/deductive reasoning, to translate low-level observations into high-level abstractions. A declarative specification defined in OWL allows prior knowledge available on the Web, and annotated with Semantic Web languages, to be easily integrated into the framework.

    Other creators
    See project
  • Twarql

    - Present

    Twitter has become a prominent medium to share opinions, observations and suggestions in real-time. Insights from these microposts ("Wisdom of the Crowd") has proved to be invaluable for businesses and researchers around the world. However, the microblog data published is increasing in numbers with the popularity and growth of Twitter. This has induced challenges in filtering these microblog data to cater the needs for aggregation and collective analysis for sensemaking. Twarql addresses these…

    Twitter has become a prominent medium to share opinions, observations and suggestions in real-time. Insights from these microposts ("Wisdom of the Crowd") has proved to be invaluable for businesses and researchers around the world. However, the microblog data published is increasing in numbers with the popularity and growth of Twitter. This has induced challenges in filtering these microblog data to cater the needs for aggregation and collective analysis for sensemaking. Twarql addresses these challenges by leveraging Semantic Web technologies to enable a flexible query language for filtering microblog posts.

    Other creators
    See project
  • Twitris+: 360 degree Social Media Analytics platform

    - Present

    Users are sharing voluminous social data (800M+ active Facebook users, 1B+ tweets/week) through social networking platforms accessible by Web and increasingly via mobile devices. This gives unprecedented opportunity to decision makers-- from corporate analysts to coordinators during emergencies, to answer questions or take actions related to a broad variety of activities and situations: who should they really engage with, how to prioritize posts for actions in the voluminous data stream, what…

    Users are sharing voluminous social data (800M+ active Facebook users, 1B+ tweets/week) through social networking platforms accessible by Web and increasingly via mobile devices. This gives unprecedented opportunity to decision makers-- from corporate analysts to coordinators during emergencies, to answer questions or take actions related to a broad variety of activities and situations: who should they really engage with, how to prioritize posts for actions in the voluminous data stream, what are the needs and who are the resource providers in emergency event, how is corporate brand performing, and does the customer support adequately serve the needs while managing corporate reputation etc. We demonstrate these capabilities using Twitris by multi-faceted real-time analysis along dimensions of Spatio-Temporal-Thematic (STT), People-Content-Network (PCN), and Subjectivity: Emotion-Sentiment-Intent (ESI). Twitris' diversity and depth of analysis is unprecedented. Twitris v1 [2009] focused on STT, Twitris v2 [2011] focused on PCN, and Twitris v3 [2012- ] initiated ESI, extended other dimensions by extending PAN analysis with expression capability involving use of background knowledge, and will soon add real-time analytics incorporating Kno.e.sis' Twarql framework.

    Twitris leverages an array of techniques and technologies that traditionally fall under big data (or scalable unstructured data analysis), social media analysis (including user generated content analysis), and Semantic Web (including extensive use of RDF), and algorithms that use statistical, linguistics, machine learning, and complex/semantic query processing.

    Project alumni: Karthik Gomadam, Meena Nagarajan, Ashutosh Jadhav,

    Research System (live): http://twitris.knoesis.org Project Page: http://wiki.knoesis.org/index.php/Twitris
    Twitris' progress towards commercialization: http://wiki.knoesis.org/index.php/Market_Driven_Innovations_and_Scaling_up_of_Twitris

    Other creators
    See project
  • Semantic Sensor Web

    - Present

    Millions of sensors around the globe currently collect avalanches of data about our environment. The rapid development and deployment of sensor technology involves many different types of sensors, both remote and in situ, with such diverse capabilities as range, modality, and maneuverability. It is possible today to utilize networks with multiple sensors to detect and identify objects of interest up close or from a great distance. The lack of integration and communication between these…

    Millions of sensors around the globe currently collect avalanches of data about our environment. The rapid development and deployment of sensor technology involves many different types of sensors, both remote and in situ, with such diverse capabilities as range, modality, and maneuverability. It is possible today to utilize networks with multiple sensors to detect and identify objects of interest up close or from a great distance. The lack of integration and communication between these networks, however, often leaves this avalanche of data stovepiped and intensifies the existing problem of too much data and not enough knowledge. With a view to alleviating this glut, we propose that sensor data be annotated with semantic metadata to provide contextual information essential for situational awareness. In particular, Semantic Sensor Web is a framework for managing heterogeneity among sensor descriptions and sensor observation data through semantic modeling and annotation to enable advanced Web-based data integration, query, and inference. This project has helped to initiate a W3C Incubator Group, the Semantic Sensor Network XG, and develop a standard ontology and semantic annotation framework. These tools are achieving broad adoption and application within the sensing community for managing sensor data on the Web.

    Selected Publications
    - The SSN Ontology of the W3C Semantic Sensor Network Incubator Group (Journal of Web Semantics, 2012): http://knoesis.wright.edu/library/resource.php?id=1659
    - Semantic Sensor Network XG Final Report (W3C Incubator Group Report, 2011): http://www.knoesis.org/library/resource.php?id=1635
    - SemSOS: Semantic Sensor Observation Service (CTS, 2009): http://knoesis.wright.edu/library/resource.php?id=00596
    - Semantic Sensor Web (IEEE Internet Computing, 2008): http://knoesis.wright.edu/library/resource.php?id=00311

    Other creators
    See project
  • Advanced School on Service Oriented Computing

    The Advanced School on Service-Oriented Computing (SOC) brings together the best international experts on software and services with PhD students, young researchers and professionals from leading academic, research and industrial organizations across Europe and around the world. Students who attend the prestigious Erasmus Mundus International Master on Service Engineering (IMSE) participate in the Advanced School as part of their study program. Topics span the entire field of SOC from…

    The Advanced School on Service-Oriented Computing (SOC) brings together the best international experts on software and services with PhD students, young researchers and professionals from leading academic, research and industrial organizations across Europe and around the world. Students who attend the prestigious Erasmus Mundus International Master on Service Engineering (IMSE) participate in the Advanced School as part of their study program. Topics span the entire field of SOC from conceptual foundations to industrial applications.
    In addition to high quality training, the Advanced School helps forge a new research and scientific community on Service-Oriented Computing(SOC). The Advanced School fosters the free exchange of ideas and helps the participants to network and start new cooperative research projects. The School Directors are internationally known experts and researchers on SOC. This year the major themes of Advanced School on SOC are: Conceptual Foundations, Computing in the Clouds, People in SOCs and Emerging Topics.

    Other creators
    See project
  • Traffic Analytics using Textual and Sensor Data

    -

    Traffic congestions have become a major issue in many cities around the world. At Kno.e.sis, researches work on understanding city issues such as traffic problems to provide insights to decision/policy makers. We pursue this understanding utilizing a unique approach of processing both machine sensor data and citizen sensor data related to traffic. Citizen sensor observations complement or corroborate machine sensor observations and when processed together leads to deeper insights into a…

    Traffic congestions have become a major issue in many cities around the world. At Kno.e.sis, researches work on understanding city issues such as traffic problems to provide insights to decision/policy makers. We pursue this understanding utilizing a unique approach of processing both machine sensor data and citizen sensor data related to traffic. Citizen sensor observations complement or corroborate machine sensor observations and when processed together leads to deeper insights into a Cyber-Physical-Social system like a city.

    Other creators
    See project
  • Semantic Platform for Open Materials Science and Engineering

    -

    Innovations in materials play an essential role in our progress towards a better life - from improving laptop battery life to developing protective gears that prevent life threatening injuries and making aircraft more efficient. However, it often takes 20 years from the time of discovery to when a new material is put into practical applications. The Whitehouse’s Materials Genome Initiative (MGI; http://www.whitehouse.gov/mgi/) seeks to improve the US’ competitiveness in the 21st Century by…

    Innovations in materials play an essential role in our progress towards a better life - from improving laptop battery life to developing protective gears that prevent life threatening injuries and making aircraft more efficient. However, it often takes 20 years from the time of discovery to when a new material is put into practical applications. The Whitehouse’s Materials Genome Initiative (MGI; http://www.whitehouse.gov/mgi/) seeks to improve the US’ competitiveness in the 21st Century by discovering, manufacturing, and deploying advanced materials twice as fast, at a fraction of the cost. Kno.e.sis’ two related projects [1][2] involve collaboration between computer and material scientists, and will play a central role in developing the Digital Data component of MGI’s Materials Innovation Infrastructure.

    Alumni: Maryam Panahiazar

    [1] Federated Semantic Services Platform for Open Materials Science and Engineering
    [2] Materials Database Knowledge Discovery and Data Mining

    Other creators
    See project
  • SoCS: Social Media Enhanced Organizational Sensemaking in Emergency Response

    -

    Online social networks and always-connected mobile devices have empowered citizens and organizations to communicate and coordinate effectively in the wake of critical events. Specifically, there have been many examples of using Twitter to provide timely and situational information about emergencies to relief organizations, and to conduct ad-hoc coordination. This NSF sponsored multidisciplinary research involving Computer Scientists and Cognitive Scientists at Wright State University and Ohio…

    Online social networks and always-connected mobile devices have empowered citizens and organizations to communicate and coordinate effectively in the wake of critical events. Specifically, there have been many examples of using Twitter to provide timely and situational information about emergencies to relief organizations, and to conduct ad-hoc coordination. This NSF sponsored multidisciplinary research involving Computer Scientists and Cognitive Scientists at Wright State University and Ohio State University seeks to understand the full ramifications of using social networks for effective organizational sensemaking in such contexts.

    This project is expected to have a significant impact in the specific context of disaster and emergency response. However, elements of this research are expected to have much wider utility, for example in the domains of e-commerce, and social reform. From a computational perspective, this project introduces the novel paradigm of spatio-temporal-thematic (STT) and people-content-network analysis (PCNA) of social media and traditional media content, implemented as part of Twitris (http://twitris.knoesis.org). Applications of STT and PCNA extend well beyond organized sensemaking. For social scientists, it provides a platform that can be used to assess relative efficacy of various organizational structures and is expected to provide new insights into the types of social network structures (mix of symmetric and asymmetric) that might be better suitable to propagate information in emergent situations. From an educational standpoint, the majority of funds will be used to train the next generation of interdisciplinary researchers drawn from the computational and social sciences.

    Keywords: Social Networking, Emergency Response, People-Content-Network Analysis (PCNA), Spatio-Temporal-Thematic Analysis (STT Analysis), Organizational Sensemaking, Collaborative Decision Making.

    Project Site: http://knoesis.org/research/semsoc/projects/socs

    Other creators
    See project
  • PREDOSE: PREscription Drug abuse Online-Surveillance and Epidemiology project

    -

    NIH funded PREDOSE is an inter-disciplinary collaborative project between the Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) and the Center for Interventions, Treatment and Addictions Research (CITAR) at Wright State University. The overall aim of PREDOSE is to develop automated techniques for web forum data analysis related to the illicit use of pharmaceutical opioids. This research complements traditional epidemiological studies involving interview based data gathering.…

    NIH funded PREDOSE is an inter-disciplinary collaborative project between the Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) and the Center for Interventions, Treatment and Addictions Research (CITAR) at Wright State University. The overall aim of PREDOSE is to develop automated techniques for web forum data analysis related to the illicit use of pharmaceutical opioids. This research complements traditional epidemiological studies involving interview based data gathering. Many Web 2.0 empowered social platforms, including Web forums and Twitter, provide venues for individuals to freely share their experiences, post questions, and offer comments about different drugs. PREDOSE aims to analyze such social data to provide timely and emerging information on the non-medical use of pharmaceutical opioids. Primary goals include:

    To determine user knowledge, attitudes and behavior related to the non-medical use of pharmaceutical opioids (namely buprenorphine) as discussed on social platforms

    To determine spatio-temporal trends and patterns in pharmaceutical opioid abuse as discussed on Web-based forums

    The project has already provided unusual and unexpected insights, such as self-treatment of opioid withdrawal symptoms with Loperamide.

    Other creators
    See project
  • Ontology Concept Elicitation Tool(OnCET)

    -

    In an attempt to capture the engineering process of a material design, and making the data available
    in the form of Ontology - we have created the Ontology Concepts Elicitation tool (OnCET). Information
    gathered through this web application can be exported to triple stores to enhance the material science
    literature search capabilities.

    Other creators
  • PhylOnt : A Domain-Specific Ontology for Phylogenetic Analysis

    -

    PhylOnt is a collabotation project with University of Georgia. The specific objective of this reserach was to develop and deploy an ontology for a novel ontology-driven semantic problem solving approach in phylogenetic analysis and down- stream use of phylogenetic trees. This is a foundation to allow an integrated platform in phylogenetically based comparative analysis and data integration. PhylOnt is an extensible ontology, that describes the methods employed to estimate trees given a data…

    PhylOnt is a collabotation project with University of Georgia. The specific objective of this reserach was to develop and deploy an ontology for a novel ontology-driven semantic problem solving approach in phylogenetic analysis and down- stream use of phylogenetic trees. This is a foundation to allow an integrated platform in phylogenetically based comparative analysis and data integration. PhylOnt is an extensible ontology, that describes the methods employed to estimate trees given a data matrix, models and programs used for phylogenetic analysis and descriptions of phylogenetic trees including branch-length information and support values. It also describes the provenance information for phylogenetics analysis data such as information about publications and studies related to phylogenetic analyses. To illustrate the utility of PhylOnt, I annotated scientific literature and files to support semantic search.

    Other creators
    See project
  • MobiCloud

    -

    MobiCloud is a domain specific language (DSL) based cloud-mobile hybrid application generation framework. The project won the prestigious Technology Award at 2012 Fukuoka Ruby Award Competition (from among 82 entries from 9 countries).

    Other creators
    See project

Languages

  • Hindi

    Native or bilingual proficiency

  • Gujarati

    Native or bilingual proficiency

  • English

    Full professional proficiency

Recommendations received

10 people have recommended Amit

Join now to view

More activity by Amit

View Amit’s full profile

  • See who you know in common
  • Get introduced
  • Contact Amit directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Amit Sheth in United States

Add new skills with these courses