Category Archives: Publication

New NERDS paper on COVID genome sequencing

Our newest faculty hire Jonas L. Juul is already making a splash. He published a big multi-author paper in Nature Communications: High-resolution epidemiological landscape from ~290,000 SARS-CoV-2 genomes from Denmark, by M.P. Khurana et al

We are happy that with Jonas, who was part of the Statens Serum Institut’s expert group on mathematical modeling of COVID-19 during the reopening of Denmark in the spring and summer of 2020, we have gained a solid footing in medical applications of data/network science.


We examined the drivers of molecular evolution and spread of 291,791 SARS-CoV-2 genomes from Denmark in 2021. With a sequencing rate consistently exceeding 60%, and up to 80% of PCR-positive samples between March and November, the viral genome set is broadly whole-epidemic representative. We identify a consistent rise in viral diversity over time, with notable spikes upon the importation of novel variants (e.g., Delta and Omicron). By linking genomic data with rich individual-level demographic data from national registers, we find that individuals aged  < 15 and  > 75 years had a lower contribution to molecular change (i.e., branch lengths) compared to other age groups, but similar molecular evolutionary rates, suggesting a lower likelihood of introducing novel variants. Similarly, we find greater molecular change among vaccinated individuals, suggestive of immune evasion. We also observe evidence of transmission in rural areas to follow predictable diffusion processes. Conversely, urban areas are expectedly more complex due to their high mobility, emphasising the role of population structure in driving virus spread. Our analyses highlight the added value of integrating genomic data with detailed demographic and spatial information, particularly in the absence of structured infection surveys.

New NERDS paper on network analysis of Italian music

A new NERDS authored paper is out in Applied Network Science: Node attribute analysis for cultural data analytics: a case study on Italian XX–XXI century music, by M. Coscia


We use the Italian music record industry from 1902 to 2024 as a case study. In this scenario, a possible research objective could be to discuss the relationships between different music genres as they are performed by different bands. Estimating genre similarity by counting the number of records each band published performing a given genre is not enough, because it assumes bands operate independently from each other. In reality, bands share members and have complex relationships. These relationships cannot be automatically learned, both because we miss the data behind their creation, but also because they are established in a serendipitous way between artists, without following consistent patterns. However, we can be map them in a complex network. We can then use the counts of band records with a given genre as a node attribute in a band network. In this paper we show how recently developed techniques for node attribute analysis are a natural choice to analyze such attributes. Alternative network analysis techniques focus on analyzing nodes, rather than node attributes, ending up either being inapplicable in this scenario, or requiring the creation of more complex n-partite high order structures that can result less intuitive. By using node attribute analysis techniques, we show that we are able to describe which music genres concentrate or spread out in this network, which time periods show a balance of exploration-versus-exploitation, which Italian regions correlate more with which music genres, and a new approach to classify clusters of coherent music genres or eras of activity by the distance on this network between genres or years.

Three new NERDS papers with our master students: Failing our youngest, superblockify, women on wikipedia

We have 3 new papers that came out over the summer so far, on diverse, very interesting topics. The first authors in all 3 of these papers were our master students – showing how impactful good master projects can be:

  1. Failing Our Youngest: On the Biases, Pitfalls, and Risks in a Decision Support Algorithm Used for Child Protection, by T.M. Hansen, R. Sinatra, and V. Sekara, published at FAccT’24
    Through a freedom of information request, we accessed a new algorithm of Danish child protection services to aid caseworkers in identifying children at heightened risk of maltreatment, named Decision Support, and conduct an audit. We find that the algorithm has significant methodological flaws, suffers from information leakage, relies on inappropriate proxy values for maltreatment assessment, generates inconsistent risk scores, and exhibits age-based discrimination. Given these serious issues, we strongly advise against the use of this kind of algorithms in local government, municipal, and child protection settings, and we call for rigorous evaluation of such tools before implementation and for continual monitoring post-deployment by listing a series of specific recommendations.

    See also our accompanying policy paper published earlier.
  2. superblockify: A Python Package for Automated Generation, Visualization, and Analysis of Potential Superblocks in Cities, by C.M. Büth, A. Vybornova, and M. Szell, published in The Journal of Open Source Software (JOSS)
    superblockify is a Python package designed to assist in planning future Superblock implementations by partitioning an urban street network into Superblock-like neighborhoods and providing tools for visualizing and analyzing these partition results. A Superblock is a set of adjacent urban blocks where vehicular through traffic is prevented or pacified, giving priority to people walking and cycling. The potential Superblock blueprints
    and descriptive statistics generated by superblockify can be used by urban planners as a first step in a data-driven planning pipeline for future urban transformations, or by urban data scientists as an efficient computational method to evaluate potential Superblock partitions.


    The software is available at: superblockify.city
  3. Traces of Unequal Entry Requirement for Illustrious People on Wikipedia Based on their Gender, by L. Krivaa and M. Coscia, published in Advances in Complex Systems
    In this paper, we study issues of fair gender representations for people in history noted by multiple language editions of Wikipedia: are women underrepresented on Wikipedia? We do so via a combination of natural language processing and network science. Our results indicate that there is indeed a higher bar for women to have their own biographical page on Wikipedia: women are only included when they have more significant connections than men to the rest of the network. There are visible effects of the initiatives Wikipedia is taking to fix this issue, showing that the gap is narrowing, which validates our interpretation of the data.

New NERDS paper on urban morphology & street network simplification

A new NERDS co-authored paper is out open-access in the Journal of Spatial Information Science (JOSIS): A shape-based heuristic for the detection of urban block artifacts in street networks, by Martin Fleischmann & Anastassia Vybornova.

a) Bridge, Amsterdam; b) Roundabout, Abidjan; c) Intersection, Kabul; d) Motorway, Vienna. Polygons classified as face artifacts are shown in red, and the OSM street network (without service roads) is shown in black. Face artifacts are polygons enclosed by street network geometries (in the case of OSM, lane centerlines) that do not represent morphological urban blocks, but instead are a result of detailed transportation-focused mapping of the streetscape. Map data (c) OpenStreetMap contributors (c) CARTO

a) Bridge, Amsterdam; b) Roundabout, Abidjan; c) Intersection, Kabul; d) Motorway, Vienna. Polygons classified as face artifacts are shown in red, and the OSM street network (without service roads) is shown in black. Face artifacts are polygons enclosed by street network geometries (in the case of OSM, lane centerlines) that do not represent morphological urban blocks, but instead are a result of detailed transportation-focused mapping of the streetscape. Map data (c) OpenStreetMap contributors (c) CARTO

We propose a cheap computational heuristic for the identification of ‘face artifacts’, i.e., geometries that are enclosed by transportation edges but do not represent urban blocks. Sounds cryptic? Just check out the picture – the artifacts (in red) might be painfully familiar to anyone who has worked with street network data. Our proposed heuristic, implemented open-source in momepy, is the first step towards a fully automated street network simplification workflow. Next steps coming up – stay tuned!

NERDS at ICWSM’24

This week, Arianna and Anders are representing NERDS at ICWSM in Buffalo, NY, with two freshly-published papers.

  1. Narratives of Collective Action in YouTube’s Discourse on Veganism, by A. Pera and L.M. Aiello. ICWSM’24.

    We studied vegan narratives on YouTube through the lens of a theoretical framework of moral narratitves. We studied how different narratives elicit different types of responses from video commenters, and found that videos advocating social activism are the most effective at stirring reactions marked by heightened linguistic markers that relate to collective action.
  2. The Persuasive Power of Large Language Models by A.G. Møller and L.M. Aiello. ICWSM’24.

    Can artificial agents interact with each other to reproduce human-like persuasive dialogue? And do the arguments they generate sound persuasive to humans? We used Llama2 to test different persuasion strategies, and asked humans to rate them. We found that arguments that included factual knowledge, markers of trust, expressions of support, and conveyed status were deemed most effective according to both humans and agents.

New NERDS paper out on Machine Learning in Humanitarian Work

We published a new paper:

THE OPPORTUNITIES, LIMITATIONS, AND CHALLENGES IN USING MACHINE LEARNING TECHNOLOGIES FOR HUMANITARIAN WORK AND DEVELOPMENT, by V. Sekara, M. Karsai, E. Moro, D. Kim, E. Delamonica, M. Cebrian, M. Luengo-Oroz, R. Moreno Jimenez, M. Garcia-Herranz, published in Advances in Complex Systems

Novel digital data sources and tools like machine learning (ML) and artificial intelligence (AI) have the potential to revolutionize data about development and can contribute to monitoring and mitigating humanitarian problems. The potential of applying novel technologies to solving some of humanity’s most pressing issues has garnered interest outside the traditional disciplines studying and working on international development. Today, scientific communities in fields like Computational Social Science, Network Science, Complex Systems, Human Computer Interaction, Machine Learning, and the broader AI field are increasingly starting to pay attention to these pressing issues. However, are sophisticated data driven tools ready to be used for solving real-world problems with imperfect data and of staggering complexity? We outline the current state-of-the-art and identify barriers, which need to be surmounted in order for data-driven technologies to become useful in humanitarian and development contexts. We argue that, without organized and purposeful efforts, these new technologies risk at best falling short of promised goals, at worst they can increase inequality, amplify discrimination, and infringe upon human rights.

New NERDS paper out on success in tennis

We published a new multi-NERDS paper, concluding a successful previous internship of Chiara Zappalà!

Early career wins and tournament prestige characterize tennis players’ trajectories, by C. Zappalà, S. Sousa, T. Cunha, A. Pluchino, A. Rapisarda and R. Sinatra, published in EPJ Data Science


We study the unfolding of tennis players’ careers to understand the role of early career stages and the impact of specific tournaments on players’ trajectories. We employ a comprehensive approach combining network science and analysis of the Association of Tennis Professionals (ATP) tournament data and introduce a novel method to quantify tournament prestige based on the eigenvector centrality of the co-attendance network of tournaments. Focusing on the interplay between participation in central tournaments and players’ performance, we find that the level of the tournament where players achieve their first win is associated with becoming a top player. This work sheds light on the critical role of the initial stages in the progression of players’ careers, offering valuable insights into the dynamics of success in tennis.

New NERDS paper out on bicycle network quality in Denmark

We published a new all-NERDS paper, applying our BikeDNA tool to the whole country of Denmark as part of our Cykelpulje project!

How Good Is Open Bicycle Network Data? A Countrywide Case Study of Denmark, by A. Rahbek Vierø, A. Vybornova, and M. Szell, published in Geographical Analysis


We compare the two largest open data sets on dedicated bicycle infrastructure in Denmark, OpenStreetMap (OSM) and GeoDanmark, in a countrywide data quality assessment, asking whether the data are good enough for network-based analysis of cycling conditions. We find that neither of the data sets is of sufficient quality, and that data conflation is necessary to obtain a more complete data set. Our analysis of the spatial variation of data quality suggests that rural areas are more prone to incomplete data. We demonstrate that the prevalent method of using infrastructure density as a proxy for data completeness is not suitable for bicycle infrastructure data, and that matching of corresponding features is thus necessary to assess data completeness. Based on our data quality assessment, we recommend strategic mapping efforts toward data completeness, consistent standards to support comparability between different data sources, and increased focus on data topology to ensure high-quality bicycle network data.

Explore also the interactive map: https://anerv.github.io/bikedna_webmap/

Five new NERDS publications out!

We have been very productive this year already! Five new NERDS publications are released this week:

  1. Which sport is becoming more predictable? A cross-discipline analysis of predictability in team sports, by M. Coscia, published in EPJ Data Science

    We analyze more than 300,000 professional sports matches in the 1996-2023 period from nine disciplines, to identify which disciplines are getting more/less predictable over time. We investigate the home advantage effect, since it can affect outcome predictability and it has been impacted by the COVID-19 pandemic. Going beyond previous work, we estimate which sport management model – between the egalitarian one popular in North America and the rich-get-richer used in Europe – leads to more uncertain outcomes. Our results show that there is no generalized trend in predictability across sport disciplines, that home advantage has been decreasing independently from the pandemic, and that sports managed with the egalitarian North American approach tend to be less predictable. We base our result on a predictive model that ranks team by analyzing the directed network of who-beats-whom, where the most central teams in the network are expected to be the best performing ones.

  2. Algorithmic Fairness: Learnings From a Case That Used AI For Decision Support, by V. Sekara, T.S. Skadegard Thorsen, and R. Sinatra, published by the Crown Princess Mary Center

    This policy brief provides a small introduction to algorithmic fairness and an example of auditing fairness in an algorithm which was aimed at identifying and assessing children at risk from abuse.

  3. The Parrot Dilemma: Human-Labeled vs. LLM-augmented Data in Classification Tasks, by A.G. Møller, J.A. Dalsgaard, A. Pera, L.M. Aiello (accepted at EACL’24).
    How good are Large Language Models in generating synthetic examples for training classifiers? To find out, we used GPT4 and Llama2 to augment existing training sets for typical Computational Social Science tasks. Our experiments show that the time to replace human-generated training data with LLMs has yet to come: human-generated text and labels provide more valuable information during training for most tasks. However, artificial data augmentation can add value when encountering extremely rare classes in multi-class scenarios, as finding new examples in real-world data can be challenging. 

  4. Shifting Climates: Climate Change Communication from YouTube to TikTok, by A. Pera, L.M. Aiello (accepted at WebSci’24).

    How do video content creators tailor their communication strategies in the era of short-form content? We conducted a comparative study of the YouTube and TikTok video productions of 21 prominent climate communicators active on both platforms. We found that when using TikTok, creators use a more emotionally resonant, self-referential, and action-oriented language compared to YouTube. Also, the response of the public aligns more closely to the tone of the videos in TikTok.

  5. The role of interface design on prompt-mediated creativity in Generative AI, by M. Torricelli, M. Martino, A. Baronchelli, L.M. Aiello (accepted at WebSci’24).
    We analyze 145k+ user prompts from two Generative AI platforms for image generation to see how people explore new concepts over time, and how their exploration might be influenced by different design choices in human-computer interfaces to Generative AI. We find that creativity in prompts declines when the interface provides generation shortcuts that deviate the user attention from prompting.

New NERDS papers: Network reorganization, Mastodon migration, News sharing on Facebook

We have three brand new papers out, this time in PNAS, Scientific Reports, and the Journal of Quantitative Description:

  1. Socioeconomic reorganization of communication and mobility networks in response to external shocks, by L. Napoli, V. Sekara, M. García-Herranz, and M. Karsai, published in PNAS

    We analyze mobile phone communication data to investigate the dynamics of network segregation patterns of the same set of people both in terms of mobility and of social communication during the initial wave of COVID-19 in Sierra Leone. Interestingly, we find opposite trends in the network segregation dynamics, characterized overall by simultaneous increase in mobility segregation and reduction in social network segregation. Our results underscore the significance of data-driven studies going beyond single-axis approaches to assess the impact of emergency policies.
  2. Drivers of social influence in the Twitter migration to Mastodon, by L. La Cava, L.M. Aiello, and A. Tagarelli , published in Scientific Reports

    We analyzed the social network and the public conversations of about 75,000 users who migrated from Twitter to Mastodon, as we NERDS did too a year ago, and observed that the temporal trace of their migrations is compatible with a phenomenon of social influence, as described by a compartmental epidemic model of information diffusion. Drawing from prior research on behavioral change, we delved into the factors that account for variations of the effectiveness of the influence process across different Twitter communities.
    Read more in our blog post:
    https://communities.springernature.com/posts/get-out-of-the-nest-drivers-of-social-influence-in-the-twitter-migration-to-mastodon
  3. Cracking Open the European Newsfeed, by L. Rossi, F. Giglioetto, and G. Marino, published in Journal of Quantitative Description: Digital Media

    This paper contributes to the ongoing effort to describe and quantify the quality of information that is shared on large social media platforms. We do this by complementing existing research that provided a first quantitative assessment of the quality of the information circulating on Facebook among US users. Leveraging an updated version of the same data source — Meta’s URL Shares Dataset — and replicating much of the methodology, we quantify the trustworthy and untrustworthy links to external websites that have been shared on Facebook in the period between 2019 and 2022 in three major European countries (Germany, France, and Italy). We observe a clear decline in the number of URLs present in the dataset and an increase in the URLs from untrustworthy domains as a percentage of the total URLs shared in a year. This increase seems to be higher in electoral years (in Germany and in Italy) but it does not translate into an increase of Views received from untrustworthy sources.