Dataset used for detecting DNS over HTTPS by Machine Learning
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F63839172%3A_____%2F20%3A10133288" target="_blank" >RIV/63839172:_____/20:10133288 - isvavai.cz</a>
Result on the web
<a href="http://dx.doi.org/10.5281/zenodo.3906526" target="_blank" >http://dx.doi.org/10.5281/zenodo.3906526</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.5281/zenodo.3906526" target="_blank" >10.5281/zenodo.3906526</a>
Alternative languages
Result language
angličtina
Original language name
Dataset used for detecting DNS over HTTPS by Machine Learning
Original language description
The capture of web browser data was made using the Selenium framework, which simulated classical user browsing. The browsers received command for visiting domains taken from Alexa's top 10K most visited websites. The capturing was performed on the host by listening to the network interface of the virtual machine. Overall the dataset contains almost 5,000 web-page visits by Mozilla and 1,000 pages visited by Chrome. The Cloudflared DoH proxy was installed in Raspberry PI, and the IP address of the Raspberry was set as the default DNS resolver in two separate offices in our university. It was continuously capturing the DNS/DoH traffic created up to 20 devices for around three months. The dataset contains 1,128,904 flows from which is around 33,000 labeled as DoH. We provide raw pcap data, CSV with flow data, and CSV file with extracted features.
Czech name
—
Czech description
—
Classification
Type
O - Miscellaneous
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
R - Projekt Ramcoveho programu EK
Others
Publication year
2020
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů