Performance Analysis of Small Files in HDFS using Clustering Small Files based on Centroid Algorithm

Rathidevi, R. and Parameswari, R. (2021) Performance Analysis of Small Files in HDFS using Clustering Small Files based on Centroid Algorithm. 2020 Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC). pp. 640-643.

[thumbnail of 1225.pdf] Archive
1225.pdf

Download (1MB)

Abstract

In day to day life, a lot of files are generated
from various areas, due to the rapid development of
technologies. Storing these files consumes a lot of memory
space. Large-sized files are not only represented as Big Data.
Large numbers of small files are also considered as big data.
To process large-sized file Hadoop is used. Processing small
files in Hadoop is not easy, because it holds memory space of
size 128MB separately for each and every dataset. To
overcome this, Clustering Small Files based on Centroid
(CSFC) approach is used to place the related files in a cluster.If the fetched data is not related to any other files they knew itsdifferent and a cluster will be generated. The combined filesare forwarded to HDFS for further processing. The Name nodeholds metadata and Data Node hold the dataset. The data setcan be fetched directly from the Data node in HDFS efficiently.

Item Type: Article
Subjects: Computer Science > Cyber Security
Divisions: Computer Science
Depositing User: Mr IR Admin
Date Deposited: 13 Sep 2024 05:20
Last Modified: 13 Sep 2024 05:20
URI: https://ir.vistas.ac.in/id/eprint/5778

Actions (login required)

View Item
View Item