A Model for the Analytical Performance of Data Lake in Stock Market Analysis with Databricks Delta Lake

Kamalakkannan, S. and Yasmin, A. and R, Arunkumar. and Kavitha, P. (2023) A Model for the Analytical Performance of Data Lake in Stock Market Analysis with Databricks Delta Lake. In: 2023 International Conference on Self Sustainable Artificial Intelligence Systems (ICSSAS), Erode, India.

[thumbnail of A Model for the Analytical Performance of Data Lake in Stock Market Analysis with Databricks Delta Lake _ IEEE Conference Publication _ IEEE Xplore.pdf] Archive
A Model for the Analytical Performance of Data Lake in Stock Market Analysis with Databricks Delta Lake _ IEEE Conference Publication _ IEEE Xplore.pdf

Download (556kB)

Abstract

Stock market investments are highly rewarding but also high in risk. Modern investors use variety of tools to take informed investment decisions. In the current era of digital world, financial service industry has generated huge volume and immense verities of data with extreme speed. Due to the rapid growth in data collection and the heterogeneous nature and complexity of the data, there is a need for Big Data analytical solution that would be able to deal with the stock market data. Large volumes of unstructured, heterogeneous raw data can be stored in a massively scalable manner using data lakes, which are the ideal solution to the big data storage conundrum. The ability of a data lake to preserve data in its original format while processing it at runtime using a schema on-read technique is its key feature. The challenge faced in the data lake is performing analytics which is a significant tool to calculate and analyze the stock market. The proposed architecture of Azure Databricks DeltaLake (ADDL) with Azure DataLake Storage Generation 2 (ADLSG2) is used for analytical processes like Fibonacci retracement for better stock analysis, which aid in forecasting the market price for better investment. As a result, the research focus is to produce a storage having read as well as write capabilities by taking into consideration the Extract-Load-Transform (ELT) operation on the datasource. In this experimental databricks implementation, runtime is performed using open source of Apache Spark API and a highly improved execution engine, which results in a significant performance improvement when comparing to the standard source of Apache Spark available on the ADLS platform. Additionally, the Fibonacci retracement level calculation is achieved with the analytics and forecasting of test close price with various ML and DL techniques such as KNN, LSTM are compared with original price of the test data for better prediction of forecast close price.

Item Type: Conference or Workshop Item (Paper)
Subjects: Computer Science > Database Management System
Divisions: Information Technology
Depositing User: Mr IR Admin
Date Deposited: 21 Sep 2024 05:32
Last Modified: 21 Sep 2024 05:32
URI: https://ir.vistas.ac.in/id/eprint/6788

Actions (login required)

View Item
View Item