Data & Knowledge Engineering

Action Conceptualization (Yu, Kaiqi)
In semantic analysis, verbs are important hints for correctly understand the whole sentence rather than discrete nouns. Verbs can specify one sense from multi-sense nouns while in the opposite direction, nouns also constrains the usage of verbs. The goal of our project is to extracts the underlying relation between verbs and nouns, abstract the noun arguments of the verbs (We call it conceptualization), and builds a verb centered lexicon. Our lexicon can be used in argument identification, action frames generation and term similarity computation.

Mind Drifting (Keyang)
We propose to build a semantic network for concept association, which tries to model human’s mind-drifting process. That is, we propose to model the process how people’s thinking drifts from concept A to concept B, and then from concept B to concept C, etc. We try to discover the association chain and latent bridge concepts for each pair of concepts in the network and measure how strong the association is. The discovered association chain, latent concepts and association strength measurement can find many applications in information retrieval, data mining and natural language process (NLP) area.

Question Answering (Kangqi, Yang, Jenny)
Automatic question answering is an open research topic in natural language processing. Each question contains a real world relation between entities. Our project aims at digging semantic meeanings behind the question, finding the correct relation and relation arguments in the question, and finally, transforming them to a machine-readable representaion and query the answer in structured knowledge bases, such as Freebase and YAGO.

Source Code Topic Mining (Ke, Jacky)
This is a project cooperated with Oracle Australia. Oracle made a product named CodeMap, developing a scalable, spatial visualisation for large codebases based on a world-map metaphor. The core of the idea is mapping the continent/country/state/city/etc. hierarchy of a map to the equivalent in a code base, e.g. architectural component/package/class/method/etc. A mining project, which aims to find the related topics from source code repository, maps the code data to the concepts from a taxonomy. We try to build a graphical model of repositories and find the latent topics of the code.

High-order Graph-based Dependency Parser (Bean, Yizhong, Jia)
Our project is about to improve performance of graph-based parser by introduce high-order features while not worsening the efficiency. High-order features do allow parser to consider more (e.g. sibling, grandparent etc. ) while parsing and scoring, however, it also brings a problem of parsing speed. How to make a trade-off between accuracy and speed is worth exploring.

Causality Extraction (Jessie, Yuchen)
Causality can help for human reasoning and decision making. We extract casuality(cause-effect pairs) from web corpus. This work aims to extend the causal relation for WordNet.

Medical Information Extraction (Dong, Jack, Chen, Jinyi, David)
This project aims to extract structured data from the record provided from AstraZeneca. We mainly focus on the present illness history and past history, extract key information based on time, thus producing several records from one case of illness.

PredicTV (Past)
A dynamic TV program recommender

Wikification via Co-occurrence (Past)
A simple but surprisingly powerful framework of sense disambiguation using co-occurrences of Wikipedia links in the Wikipedia corpus.

Classified Image Search by Conceptualization (Past)
The use of an external knowledge base to make better sense out of the text signals in a prototype system called CISC. Once we understand the semantics of the text better, the result of the clustering is significantly improved. In addition to clustering the images by their semantic entities, our system can also conceptualize each image cluster into a set of concepts to represent the meaning of the cluster.

Top K List (Past)
Extraction of top k list from any web pages. This work is concerned with information extraction from top-k web pages, which are web pages that describe top k instances of a topic which is of general interest.

Set-valued data anonymization (Past)
A partial suppression framework for anonymizing set-valued data and conduct sufficient experiments which as a result fully represents the efficiency and effectiveness of our partial suppression technique.

Trajectory Inference Problem (Past)
A new security problem in which individuals movement traces (in terms of accurate routes) can be inferred from just a series of mutual contact records and the map of the area in which they roam around.

Programming Language

Probabilistic Programming (Zhuoyue, Quanjing)
Probabilistic programming languages are in the spotlight, with their use in machine learning, cognitive science, information retrieval and artificial intelligence fields. Probabilistic programming aims to make the code of probabilistic models shorter, to reduce development time, to facilitate the construction of richer models, to require lower levels of expertise in building machine learning applications, and to support the construction of integrated models. What we proposed is the Portable Probabilistic Programming Framework that can be embedded in every programming language people commonly used.

Speculative Nondeterminism (Past)
A new programmable concurrency control framework called speculative nondeterminism for real time, "open" and distributed agents.

Rich-IP (Past)
We propose a high-level configurable description language that automatically synthesizes into IP core designs.