NFDI4DS | UHH-SEMS - Publication Details

Michelle Sweering

ORCID: 0000-0003-1200-6015

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5029963514

Research Areas

Algorithms and Data Compression
Natural Language Processing Techniques
semigroups and automata theory
Privacy-Preserving Technologies in Data
DNA and Biological Computing
Genomics and Phylogenetic Studies
Complexity and Algorithms in Graphs
Cryptography and Data Security
Advanced Graph Theory Research
Data Quality and Management
Data Mining Algorithms and Applications
Plant and animal studies
Genome Rearrangement Algorithms
Ecology and Vegetation Dynamics Studies
Optimization and Search Problems
Network Packet Processing and Optimization
Imbalanced Data Classification Techniques
Plant Parasitism and Resistance
Data Management and Algorithms
Machine Learning in Bioinformatics
Fuzzy and Soft Set Theory
3D Shape Modeling and Analysis
Authorship Attribution and Profiling
Diabetic Foot Ulcer Assessment and Management
Access Control and Trust

Centrum Wiskunde & Informatica
2019-2025

Vitenparken
2022

Conservation Leadership Programme
2018-2019

University of Cambridge
2018-2019

bmotif: A package for motif analyses of bipartite networks

OPENALEX - Publications

Benno I. Simmons Michelle Sweering Maybritt Schillinger Lynn V. Dicks William J. Sutherland and 1 more

Abstract Bipartite networks are widely used to represent a diverse range of species interactions, such as pollination, herbivory, parasitism and seed dispersal. The structure these is usually characterised by calculating one or more indices that capture different aspects network architecture. While useful properties networks, they relatively insensitive changes in structure. Consequently, variation ecologically‐important interactions can be missed. Network motifs way characterise...

10.1111/2041-210x.13149 article EN cc-by Methods in Ecology and Evolution 2019-01-12

Missing value replacement in strings and applications

OPENALEX - Publications

Giulia Bernardini Chang Liu Grigorios Loukides Alberto Marchetti-Spaccamela Solon P. Pissis and 2 more

Abstract Missing values arise routinely in real-world sequential (string) datasets due to: (1) imprecise data measurements; (2) flexible sequence modeling, such as binding profiles of molecular sequences; or (3) the existence confidential information a dataset which has been deleted deliberately for privacy protection. In order to analyze datasets, it is often important replace each missing value, with one more valid letters, an efficient and effective way. Here we formalize this task...

10.1007/s10618-024-01074-3 article EN cc-by Data Mining and Knowledge Discovery 2025-01-22

Convergence of the Number of Period sets in Strings

OPENALEX - Publications

Éric Rivals Michelle Sweering Pengfei Wang

10.1007/s00453-025-01295-y article EN Algorithmica 2025-02-21

Elastic-Degenerate String Comparison

OPENALEX - Publications

Esteban Gabory Moses Njagi Mwaniki Nadia Pisanti Solon P. Pissis Jakub Radoszewski and 2 more

10.1016/j.ic.2025.105296 article EN cc-by Information and Computation 2025-03-01

Pangenome comparison via ED strings

OPENALEX - Publications

Esteban Gabory Moses Njagi Mwaniki Nadia Pisanti Solon P. Pissis Jakub Radoszewski and 2 more

Introduction An elastic-degenerate (ED) string is a sequence of sets strings. It can also be seen as directed acyclic graph whose edges are labeled by The notion ED strings was introduced simple alternative to variation and graphs for representing pangenome, that is, collection genomic sequences analyzed jointly or used reference. Methods In this study, we define notions matching statistics two similarity measures between pangenomes and, consequently infer corresponding distance measure. We...

10.3389/fbinf.2024.1397036 article EN cc-by Frontiers in Bioinformatics 2024-09-26

Bidirectional String Anchors for Improved Text Indexing and Top-$K$ Similarity Search

OPENALEX - Publications

Grigorios Loukides Solon P. Pissis Michelle Sweering

The minimizers sampling mechanism is a popular for string sampling. However, mechanisms lack good guarantees on the expected size of their samples different combinations input parameters. Furthermore, indexes constructed over worst-case on-line pattern searches. In response, we propose bidirectional anchors (bd-anchors), new mechanism. Given an integer <inline-formula><tex-math notation="LaTeX">$\ell$</tex-math></inline-formula> , our selects lexicographically smallest rotation in every...

10.1109/tkde.2022.3231780 article EN IEEE Transactions on Knowledge and Data Engineering 2023-01-16

String Sanitization Under Edit Distance

OPENALEX - Publications

Giulia Bernardini Huiping Chen Grigorios Loukides Nadia Pisanti Solon P. Pissis and 2 more

Let W be a string of length n over an alphabet Σ, k positive integer, and set length-k substrings W. The ETFS problem asks us to construct X_{ED} such that: (i) no occurs in X_{ED}; (ii) the order all other Σ is same (iii) has minimal edit distance When represents individual’s data confidential substrings, algorithms solving can applied for utility-preserving sanitization [Bernardini et al., ECML PKDD 2019]. Our first result here algorithm solve (kn²) time, which improves on state art arXiv...

10.4230/lipics.cpm.2020.7 preprint EN other-oa HAL (Le Centre pour la Communication Scientifique Directe) 2020-06-17

Combinatorial Algorithms for String Sanitization

OPENALEX - Publications

Giulia Bernardini Huiping Chen Alessio Conte Roberto Grossi Grigorios Loukides and 4 more

String data are often disseminated to support applications such as location-based service provision or DNA sequence analysis. This dissemination, however, may expose sensitive patterns that model confidential knowledge (e.g., trips mental health clinics from a string representing user’s location history). In this article, we consider the problem of sanitizing by concealing occurrences patterns, while maintaining utility, in two settings relevant many common processing tasks. first setting,...

10.1145/3418683 article EN ACM Transactions on Knowledge Discovery from Data 2020-12-07

bmotif: a package for motif analyses of bipartite networks

OPENALEX - Publications

Benno I. Simmons Michelle Sweering Maybritt Schillinger Lynn V. Dicks William J. Sutherland and 1 more

Abstract Bipartite networks are widely-used to represent a diverse range of species interactions, such as pollination, herbivory, parasitism and seed dispersal. The structure these is usually characterised by calculating one or more metrics that capture different aspects network architecture. While useful properties networks, they only consider at the scale whole (the macro-scale) individual micro-scale). ‘Meso-scale’ between scales ignored, despite representing ecologically-important...

10.1101/302356 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2018-04-17

On Breaking Truss-Based Communities

OPENALEX - Publications

Huiping Chen Alessio Conte Roberto Grossi Grigorios Loukides Solon P. Pissis and 1 more

A k-truss is a graph such that each edge contained in at least k-2 triangles. This notion has attracted much attention, because it models meaningful cohesive subgraphs of graph. We introduce the problem identifying smallest subset given whose removal makes k-truss-free. also variant where identified contains only edges incident to set nodes and ensures these are not any k-truss. These problems directly applicable communication networks: correspond vital network connections; or social can be...

10.1145/3447548.3467365 preprint EN 2021-08-12

Hide and Mine in Strings: Hardness and Algorithms

OPENALEX - Publications

Giulia Bernardini Alessio Conte Garance Gourdel Roberto Grossi Grigorios Loukides and 5 more

We initiate a study on the fundamental relation between data sanitization (i.e., process of hiding confidential information in given dataset) and frequent pattern mining, context sequential (string) data. Current methods for string hide patterns introducing, however, number spurious that may harm utility mining. The main computational problem is to minimize this harm. Our contribution here twofold. First, we present several hardness results, different variants problem, essentially showing...

10.1109/icdm50108.2020.00103 article EN 2021 IEEE International Conference on Data Mining (ICDM) 2020-11-01

Hide and Mine in Strings: Hardness, Algorithms, and Experiments

OPENALEX - Publications

Giulia Bernardini Alessio Conte Garance Gourdel Roberto Grossi Grigorios Loukides and 5 more

Data sanitization and frequent pattern mining are two well-studied topics in data mining. Our work initiates a study on the fundamental relation between context of sequential (string) data. Current methods for string hide confidential patterns. This, however, may lead to spurious patterns that harm utility The main computational problem is minimize this harm. contribution here as follows. First, we present several hardness results, different variants problem, essentially showing these cannot...

10.1109/tkde.2022.3158063 article EN publisher-specific-oa IEEE Transactions on Knowledge and Data Engineering 2022-01-01

On Breaking Truss-based and Core-based Communities

OPENALEX - Publications

Huiping Chen Alessio Conte Roberto Grossi Grigorios Loukides Solon P. Pissis and 1 more

We introduce the general problem of identifying a smallest edge subset given graph whose deletion makes community-free. consider this under two community notions that have attracted significant attention: k -truss and -core. also variant where identified contains edges incident to set nodes ensures these are not contained in any community: or -core, our case. These problems directly applicable social networks: The can be hidden by users sanitized from output graph; communication correspond...

10.1145/3644077 article EN other-oa ACM Transactions on Knowledge Discovery from Data 2024-02-15

Connecting de Bruijn Graphs

OPENALEX - Publications

Giulia Bernardini Huiping Chen Inge Li Gørtz Christoffer Krogh Grigorios Loukides and 3 more

10.4230/lipics.cpm.2024.6 article NL cc-by 2024-06-25

Elastic-Degenerate String Comparison

OPENALEX - Publications

Esteban Gabory Moses Njagi Mwaniki Nadia Pisanti Solon P. Pissis Jakub Radoszewski and 2 more

An elastic-degenerate (ED) string $T$ is a sequence of $n$ sets $T[1],\ldots,T[n]$ containing $m$ strings in total whose cumulative length $N$. We call $n$, $m$, and $N$ the length, cardinality size $T$, respectively. The language defined as $L(T)=\{S_1 \cdots S_n\,:\,S_i \in T[i]$ for all $i\in[1,n]\}$. ED have been introduced to represent set closely-related DNA sequences, also known pangenome. basic question we investigate here is: Given two strings, how fast can check whether languages...

10.48550/arxiv.2411.07782 preprint EN arXiv (Cornell University) 2024-11-12

Elastic-Degenerate String Matching with 1 Error or Mismatch

OPENALEX - Publications

Giulia Bernardini Esteban Gabory Solon P. Pissis Leen Stougie Michelle Sweering and 1 more

10.1007/s00224-024-10194-8 article EN cc-by Theory of Computing Systems 2024-09-16

A Universal Error Measure for Input Predictions Applied to Online Graph Problems

OPENALEX - Publications

Giulia Bernardini Alexander Lindermayr Alberto Marchetti-Spaccamela Nicole Megow Leen Stougie and 1 more

We introduce a novel measure for quantifying the error in input predictions. The is based on minimum-cost hyperedge cover suitably defined hypergraph and provides general template which we apply to online graph problems. captures errors due absent predicted requests as well unpredicted actual requests; hence, inputs can be of arbitrary size. achieve refined performance guarantees previously studied network design problems online-list model, such Steiner tree facility location. Further,...

10.48550/arxiv.2205.12850 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Combinatorial Algorithms for String Sanitization

OPENALEX - Publications

Giulia Bernardini Huiping Chen Alessio Conte Roberto Grossi Grigorios Loukides and 4 more

String data are often disseminated to support applications such as location-based service provision or DNA sequence analysis. This dissemination, however, may expose sensitive patterns that model confidential knowledge. In this paper, we consider the problem of sanitizing a string by concealing occurrences patterns, while maintaining utility, in two settings relevant many common processing tasks. first setting, aim generate minimal-length preserves order appearance and frequency all...

10.48550/arxiv.1906.11030 preprint EN cc-by arXiv (Cornell University) 2019-01-01

Digital Foot Models from Sensor Data for Improved Orthopedic Aid Fitting

OPENALEX - Publications

Giuseppe Carere Bernard J. Geurts Bas van ’t Hof F.C. Holtkamp Erwin Luesink and 4 more

This report describes and develops different methods for converting 3D time series data to surface representations. The considered contain public domain mesh generation software, as well linear regression models representations using signed distance functions. We provide a simple code base the latter two such that they can be used further research in manner. apply test algorithms on point cloud foot model. All yield good of underlying geometry. Such therefore have big impact handling problems.

10.33774/miir-2023-mk9wn preprint EN cc-by 2023-09-30

Comparing Elastic-Degenerate Strings: Algorithms, Lower Bounds, and Applications

OPENALEX - Publications

Esteban Gabory Moses Njagi Mwaniki Nadia Pisanti Solon P. Pissis Jakub Radoszewski and 2 more

10.4230/lipics.cpm.2023.11 article EN cc-by 2023-01-01

String Sanitization Under Edit Distance: Improved and Generalized

OPENALEX - Publications

Takuya Mieno Solon P. Pissis Leen Stougie Michelle Sweering

Let $W$ be a string of length $n$ over an alphabet $\Sigma$, $k$ positive integer, and $\mathcal{S}$ set length-$k$ substrings $W$. The ETFS problem asks us to construct $X_{\mathrm{ED}}$ such that: (i) no occurs in $X_{\mathrm{ED}}$; (ii) the order all other $\Sigma$ (and thus frequency) is same (iii) has minimal edit distance When represents individual's data confidential patterns, for transforming preserve its privacy utility [Bernardini et al., ECML PKDD 2019]. can solved...

10.48550/arxiv.2007.08179 preprint EN cc-by arXiv (Cornell University) 2020-01-01

Coming Soon ...