http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
곽재혁(Jae-Hyuck Kwak),윤준원(Junweon Yoon),정용환(Yonghwan Jung),함재균(Jaegyoon Hahm),박동인(Dongin Park) 한국정보과학회 2011 정보과학회 컴퓨팅의 실제 논문지 Vol.17 No.11
; ;과학 응용 분야에서 데이터 집약형 컴퓨팅 (data-intensive computing)이 점차적으로 주목받으면서 대규모의 데이터를 빠른 시간 내에 효율적으로 처리해야 할 필요성으로 인해 클라우드 컴퓨팅이 주목받고 있다. 하둡(Hadoop)은 대규모 데이터 처리 분석을 위한 소프트웨어 프레임워크를 제공하며 클라우드 컴퓨팅의 대표적인 기술로 서 널리 사용되고 있다. 특히 하둡은 높은 확장성과 성능을 제공하면서 결함 탐지와 자동 복구 기능이 우수하여 과학 기술 분야에서도 점차적으로 도입되어 활용되고 있다. ; ;본 논문에서는 하둡을 이용하여 천문 응용 분야에서 생성되는 대규모 데이터를 분석하기 위한 방법을 제안하였다. 본 논문에서 관심을 가지는 천문 응용 데이터는 Super-WASP프로젝트에서 생성되는 대략 천만 개의 작은 크기의 관측 데이터를 처리해야 하는데 하둡은 대규모 데이터 처리에 특화되어 있어서 많은 개수의 작은 크기를 가지는 관측 데이터 처리에는 적합하지 않다. 본 논문에서는 천문 응용 데이터 처리를 위한 입출력 파일을 하둡에서 제공하는 특수화된 데이터 구조를 이용하여 압축하였고 천문 응용 실행 코드가 하둡에서 실행 가능하도록 맵리듀스 직업으로 랩핑하여 구현하였다. Data-intensive computing being highly regarded in science application area cloud computing has engaged public attention due to the necessity of efficiently processing large-scale data as soon as possible Hadoop provides software framework for large-scale data processing and analysis and is widely adopted and used as the representative technology of cloud computing. Especially roviding high-scalability and performance and getting an excellence in fault-tolerence and auto-matic recovery functionalities Hadoop is gradually used in scientific communities. In this paper we propose a Hadoop-based method to analyse large-scale data generated from astroinformatics research area. astroinformatics data we are dealing with are generated from Super WASP project which need to process about ten million of small-sized observation data However Hadoop is specialized in large-scale data analysis and it is not suitable for many small-sized astroinformatics data. In this paper we packed many small-sized astroinformatics data into large-sized ones using the specialized data structure of Hadoop and implemented MapReduce wrapper program to execute astroinformatics analysis code on Hadoop.
CAMP: Community Access MODIS Pipeline
Hendrix, V.,Ramakrishnan, L.,Ryu, Y.,van Ingen, C.,Jackson, K.R.,Agarwal, D. North-Holland ; Elsevier Science Ltd 2014 Future generations computer systems Vol.36 No.-
The Moderate Resolution Imaging Spectroradiometer (MODIS) instrument's land and atmosphere data are important to many scientific analyses that study processes at both local and global scales. The Terra and Aqua MODIS satellites acquire data of the entire Earth's surface every one or two days in 36 spectral bands. MODIS data provide information to complement many of the ground-based observations but are extremely critical when studying global phenomena such as gross photosynthesis and evapotranspiration. However, data procurement and processing can be challenging and cumbersome due to difficulties in volume, size of data and scale of analyses. For example, the very first step in MODIS data processing is to ensure that all products are in the same resolution and coordinate system. The reprojection step involves a complex inverse gridding algorithm and involves downloading tens of thousands of files for a single year that is often infeasible to perform on a scientist's desktop. Thus, use of large-scale resource environments such as high performance computing (HPC) environments are becoming crucial for processing of MODIS data. However, HPC environments have traditionally been used for tightly coupled applications and present several challenges for managing data-intensive pipelines. We have developed a data-processing pipeline that downloads the MODIS swath products and reprojects the data to a sinusoidal system on an HPC system. The 10 year archive of the reprojected data generated using the pipeline is made available through a web portal. In this paper, we detail a system architecture (CAMP) to manage the lifecycle of MODIS data that includes procurement, storage, processing and dissemination. Our system architecture was developed in the context of the MODIS reprojection pipeline but is extensible to other analyses of MODIS data. Additionally, our work provides a framework and valuable experiences for future developments and deployments of data-intensive pipelines from other scientific domains on HPC systems.
Range Segmentation of Dynamic Offloading (RSDO) Algorithm by Correlation for Edge Computing
( Jieun Kang ),( Svetlana Kim ),( Jae-ho Kim ),( Nak-myoung Sung ),( Yong-Ik Yoon ) 한국정보처리학회 2021 Journal of information processing systems Vol.17 No.5
In recent years, edge computing technology consists of several Internet of Things (IoT) devices with embedded sensors that have improved significantly for monitoring, detection, and management in an environment where big data is commercialized. The main focus of edge computing is data optimization or task offloading due to data and task-intensive application development. However, existing offloading approaches do not consider correlations and associations between data and tasks involving edge computing. The extent of collaborative offloading segmented without considering the interaction between data and task can lead to data loss and delays when moving from edge to edge. This article proposes a range segmentation of dynamic offloading (RSDO) algorithm that isolates the offload range and collaborative edge node around the edge node function to address the offloading issue.The RSDO algorithm groups highly correlated data and tasks according to the cause of the overload and dynamically distributes offloading ranges according to the state of cooperating nodes. The segmentation improves the overall performance of edge nodes, balances edge computing, and solves data loss and average latency.
Enabling Remote Fault Diagnosis through Data-driven Grid Computing
Bing Tang,Li Zhang 보안공학연구지원센터 2016 International Journal of Grid and Distributed Comp Vol.9 No.4
Due to the complexity of modern manufacturing and mechanical equipments, it is difficult for equipment users or maintainers to accomplish fault diagnosis independently. In this paper, the development history of fault diagnosis technology is surveyed and investigated, especially remote fault diagnosis system based on Internet in detail. Then, a remote fault diagnosis system based on grid computing technology is proposed to enable collaborative resource sharing and problem solving among multiple equipment suppliers and equipment users by integrating all kinds of diagnostic resources. The architecture of this fault diagnosis system is presented, as well as the Client-Master-Worker computation model and the diagnostic workflow. Finally, a prototype system is implemented using the data-driven middleware-BitDew, in which multiple fault diagnosis grid services are integrated in a unified Web portal. The case study of data-driven fault data analysis is conducted in our prototype which has proved the effectiveness of the system.
ActiveSort: Efficient external sorting using active SSDs in the MapReduce framework
Lee, Y.S.,Quero, L.C.,Kim, S.H.,Kim, J.S.,Maeng, S. North-Holland 2016 Future generations computer systems Vol.65 No.-
<P>In the last decades, there has been an explosion in the volume of data to be processed by data-intensive computing applications. As a result, processing I/O operations efficiently has become an important challenge. SSDs (solid state drives) are an effective solution that not only improves the I/O throughput but also reduces the amount of I/O transfer by adopting the concept of active SSDs. Active SSDs offload a part of the data-processing tasks usually performed in the host to the SSD. Offloading data-processing tasks removes extra data transfer and improves the overall data processing performance. In this work, we propose ActiveSort, a novel mechanism to improve the external sorting algorithm using the concept of active SSDs. External sorting is used extensively in the data-intensive computing frameworks such as Hadoop. By performing merge operations on-the-fly within the SSD, ActiveSort reduces the amount of I/O transfer and improves the performance of external sorting in Hadoop. Our evaluation results on a real SSD platform indicate that the Hadoop applications using ActiveSort outperform the original Hadoop by up to 36.1%. ActiveSort reduces the amount of write by up to 40.4%, thereby improving the lifetime of the SSD. (C) 2016 Elsevier B.V. All rights reserved.</P>