A theoretical and empirical study of affirmative sampling

thumbnail

Tutor / Supervisor

Student

Montes Sanabria, Jordi

Document type

Master thesis

Date

2020

rights

Restricted access - author's decisionRestricted access - author's decision

Publisher

Universitat Politècnica de Catalunya



Abstract

Distinct random sampling is widely used in many applications due to its ability to answer aggregate queries on data with provable probabilistic guarantees on the quality of the answer. We consider the data stream model where a single pass over the data is allowed and only a small number of elements in the stream can be stored. We analyze Affirmative Sampling, an algorithm for distinct random sampling with adaptive sample size. We discuss an unbiased cardinality estimator of the stream using the elements in the random sample from Affirmative sampling and provide bounds for its accuracy. Random samples can be used for much more than just cardinality estimation. We introduce Nooh, a fast genome distance estimation tool that uses Affirmative sampling to compete with the current state-of-the-art tools in bioinformatics.
user

Participating teacher

Files