##### #1. DAOC: Stable Clustering of Large Networks
###### Artem Lutov, Mourad Khayati, Philippe Cudré-Mauroux
Clustering is a crucial component of many data mining systems involving the analysis and exploration of various data. Data diversity calls for clustering algorithms to be accurate while providing stable (i.e., deterministic and robust) results on arbitrary input networks. Moreover, modern systems often operate with large datasets, which implicitly constrains the complexity of the clustering algorithm. Existing clustering techniques are only partially stable, however, as they guarantee either determinism or robustness. To address this issue, we introduce DAOC, a Deterministic and Agglomerative Overlapping Clustering algorithm. DAOC leverages a new technique called Overlap Decomposition to identify fine-grained clusters in a deterministic way capturing multiple optima. In addition, it leverages a novel consensus approach, Mutual Maximal Gain, to ensure robustness and further improve the stability of the results while still being capable of identifying micro-scale clusters. Our empirical results on both synthetic and real-world...
##### #2. Moments of Uniform Random Multigraphs with Fixed Degree Sequences
###### Philip S. Chodrow
We study the expected adjacency matrix of a uniformly random multigraph with fixed degree sequence $\mathbf{d}$. This matrix arises in a variety of analyses of networked data sets, including modularity-maximization and mean-field theories of spreading processes. Its structure is well-understood for large, sparse, simple graphs: the expected number of edges between nodes $i$ and $j$ is roughly $\frac{d_id_j}{\sum_\ell{d_\ell}}$. Many network data sets are neither large, sparse, nor simple, and in these cases the approximation no longer applies. We derive a novel estimator using a dynamical approach: the estimator emerges from the stationarity conditions of a class of Markov Chain Monte Carlo algorithms for graph sampling. Nonasymptotic error bounds are available under mild assumptions, and the estimator can be computed efficiently. We test the estimator on a small network, finding that it enjoys relative bias against ground truth a full order of magnitude smaller than the standard expression. We then compare modularity maximization...
##### #3. Ensemble approach for generalized network dismantling
###### Xiao-Long Ren, Nino Antulov-Fantulin
Finding a set of nodes in a network, whose removal fragments the network below some target size at minimal cost is called network dismantling problem and it belongs to the NP-hard computational class. In this paper, we explore the (generalized) network dismantling problem by exploring the spectral approximation with the variant of the power-iteration method. In particular, we explore the network dismantling solution landscape by creating the ensemble of possible solutions from different initial conditions and a different number of iterations of the spectral approximation.
##### #4. Two Computational Models for Analyzing Political Attention in Social Media
###### Libby Hemphill, Angela M. Schöpke-Gonzalez
Understanding how political attention is divided and over what subjects is crucial for research on areas such as agenda setting, framing, and political rhetoric. Existing methods for measuring attention, such as manual labeling according to established codebooks, are expensive and can be restrictive. We describe two computational models that automatically distinguish topics in politicians' social media content. Our models---one supervised classifier and one unsupervised topic model---provide different benefits. The supervised classifier reduces the labor required to classify content according to pre-determined topic list. However, tweets do more than communicate policy positions. Our unsupervised model uncovers both political topics and other Twitter uses (e.g., constituent service). These models are effective, inexpensive computational tools for political communication and social media research. We demonstrate their utility and discuss the different analyses they afford by applying both models to the tweets posted by members of...
##### #5. What matters, context or sentiment?: Analysing the influence of news in U.S. elections using Natural Language Processing
###### Federico Albanese, Sebastián Pinto, Viktoriya Semeshenko, Pablo Balenzuela
A key question in the analysis of collective social behaviour is related to know if and how mass media can influence public opinion. In this paper, we explore quantitatively the relation between a specific manifestation of public opinion and the intention to vote a given candidate, with the information related to the candidates in the Mass Media. We analyse the political news articles related to the US presidential campaign during the year 2016, using techniques of natural language processing. We applied recursive deep models for semantic composition over sentiment treebank to be able to detect the sentiment of sentences, and topic detection methods in order to characterise how media outlets get involved in the election coverage. The results of the analysis were compared to the outcomes of political polls in order to know which of these two aspects of the information have more influence and if there exists any causal relationship between them. Our results suggest that the sentiment content of the news by itself is not enough to...
##### #6. ASSED -- A Framework for Identifying Physical Events through Adaptive Social Sensor Data Filtering
###### Abhijit Suprem, Calton Pu
Physical event detection has long been the domain of static event processors operating on numeric sensor data. This works well for large scale strong-signal events such as hurricanes, and important classes of events such as earthquakes. However, for a variety of domains there is insufficient sensor coverage, e.g., landslides, wildfires, and flooding. Social networks have provided massive volume of data from billions of users, but data from these generic social sensors contain much more noise than physical sensors. One of the most difficult challenges presented by social sensors is \textit{concept drift}, where the terms associated with a phenomenon evolve and change over time, rendering static machine learning (ML) classifiers less effective. To address this problem, we develop the ASSED (Adaptive Social Sensor Event Detection) framework with an ML-based event processing engine and show how it can perform simple and complex physical event detection on strong- \textit{and} weak-signal with low-latency, high scalability, and...
##### #7. Community Detection Across Multiple Social Networks based on Overlapping Users
###### Ziqing Zhu, Tao Zhou, Chenghao Jia, Jiawei Liu, Jiuxin Cao
With the rapid development of Internet technology, online social networks (OSNs) have got fast development and become increasingly popular. Meanwhile, the research works across multiple social networks attract more and more attention from researchers, and community detection is an important one across OSNs for online security problems, such as the user behavior analysis and abnormal community discovery. In this paper, a community detection method is proposed across multiple social networks based on overlapping users. First, the concept of overlapping users is defined, then an algorithm CMN NMF is designed to discover the stub communities from overlapping users based on the social relevance. After that, we extend each stub community in different social networks by adding the users with strong similarity, and in the end different communities are excavated out across networks. Experimental results show the advantage on effectiveness of our method over other methods under real data sets.
