Vše

Co hledáte?

Vše
Projekty
Výsledky výzkumu
Subjekty

Rychlé hledání

  • Projekty podpořené TA ČR
  • Významné projekty
  • Projekty s nejvyšší státní podporou
  • Aktuálně běžící projekty

Chytré vyhledávání

  • Takto najdu konkrétní +slovo
  • Takto z výsledků -slovo zcela vynechám
  • “Takto můžu najít celou frázi”

Social media data processing infrastructure by using Apache spark big data platform: Twitter data analysis

Identifikátory výsledku

  • Kód výsledku v IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F61989100%3A27740%2F19%3A10244019" target="_blank" >RIV/61989100:27740/19:10244019 - isvavai.cz</a>

  • Výsledek na webu

    <a href="https://dl.acm.org/doi/10.1145/3361821.3361825" target="_blank" >https://dl.acm.org/doi/10.1145/3361821.3361825</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1145/3361821.3361825" target="_blank" >10.1145/3361821.3361825</a>

Alternativní jazyky

  • Jazyk výsledku

    angličtina

  • Název v původním jazyce

    Social media data processing infrastructure by using Apache spark big data platform: Twitter data analysis

  • Popis výsledku v původním jazyce

    Social media provide continuous data streams that contain information with different level of sensitivity, validity and accuracy. Therefore, this type of information has to be properly filtered, extracted and processed to avoid noisy and inaccurate results. The main goal of this work is to propose architecture and workflow able to process Twitter social network data in near real-time. The primary design of the introduced modern architecture covers all processing aspects from data ingestion and storing to data processing and analysing. This paper presents Apache Spark and Hadoop implementation. The secondary objective is to analyse tweets with the defined topic - floods. The word frequency method (Word Clouds) is shown as a major tool to analyse the content of the input dataset. The experimental architecture confirmed the usefulness of many well-known functions of Spark and Hadoop in the social data domain. The platforms which were used provided effective tools for optimal data ingesting, storing as well as processing and analysing. Based on the analytical part, it was observed that the word frequency method (n-grams) can effectively reveal the tweets content. According to the results of this study, the tweets proved their high informative potential regarding data quality and quantity. (C) 2019 Association for Computing Machinery.

  • Název v anglickém jazyce

    Social media data processing infrastructure by using Apache spark big data platform: Twitter data analysis

  • Popis výsledku anglicky

    Social media provide continuous data streams that contain information with different level of sensitivity, validity and accuracy. Therefore, this type of information has to be properly filtered, extracted and processed to avoid noisy and inaccurate results. The main goal of this work is to propose architecture and workflow able to process Twitter social network data in near real-time. The primary design of the introduced modern architecture covers all processing aspects from data ingestion and storing to data processing and analysing. This paper presents Apache Spark and Hadoop implementation. The secondary objective is to analyse tweets with the defined topic - floods. The word frequency method (Word Clouds) is shown as a major tool to analyse the content of the input dataset. The experimental architecture confirmed the usefulness of many well-known functions of Spark and Hadoop in the social data domain. The platforms which were used provided effective tools for optimal data ingesting, storing as well as processing and analysing. Based on the analytical part, it was observed that the word frequency method (n-grams) can effectively reveal the tweets content. According to the results of this study, the tweets proved their high informative potential regarding data quality and quantity. (C) 2019 Association for Computing Machinery.

Klasifikace

  • Druh

    D - Stať ve sborníku

  • CEP obor

  • OECD FORD obor

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

  • Projekt

    <a href="/cs/project/LQ1602" target="_blank" >LQ1602: IT4Innovations excellence in science</a><br>

  • Návaznosti

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

  • Rok uplatnění

    2019

  • Kód důvěrnosti údajů

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

  • Název statě ve sborníku

    CCIOT 2019 : September 20-22, 2019, Tokyo, Japan : 2019 4th International Conference on Cloud Computing and Internet of Things

  • ISBN

    978-1-4503-7241-1

  • ISSN

  • e-ISSN

  • Počet stran výsledku

    6

  • Strana od-do

    1-6

  • Název nakladatele

    Association for Computing Machinery

  • Místo vydání

    New York

  • Místo konání akce

    Tokio

  • Datum konání akce

    20. 9. 2019

  • Typ akce podle státní příslušnosti

    WRD - Celosvětová akce

  • Kód UT WoS článku