Flume 설치 및 간단한 예제

Flume 설치

- 원하는 계정(필자는 CentOs7 환경에 bigdata계정을 만들어 설치했음)에 Flume 파일 다운

$ wget http://mirror.navercorp.com/apache/flume/1.8.0/apache-flume-1.8.0-bin.tar.gz

- 다운 받은 ‘apache-flume-1.8.0-bin.tar.gz’ 파일 압축 풀기

$ tar xvfz apache-flume-1.8.0-bin.tar.gz

- root로 로그인 후 /etc/profile 열기

# vi /etc/profile

- /etc/profile에 아래와 같이 플룸 바이너리 경로를 입력후 저장

export FLUME_HOME=/home/bigdata/apache-flume-1.8.0-bin
export PATH=$PATH:$FLUME_HOME/bin

- source로 /etc/profile 적용

# source /etc/profile

- flume을 설치한 계정으로 접속해 열고 플룸의 conf 디렉터리로 이동

$ cd /home/bigdata/apache-flume-1.8.0-bin/conf

- flume-env.sh.template 파일을 flume-env.sh로 복사

$ cp flume-env.sh.template flume-env.sh

- flume-env.sh 파일을 열어 아래와 같이 자바 힙 메모리 설정

export JAVA_OPTS="-Xms100m -Xmx2000m -Dcom.sun.management.jmxremote"

- source로 flume-env.sh 파일 적용

$ source flume-env.sh

Flume 수집 기능 테스트

HDFS로 적재하기 전, 플럼의 수집 기능을 테스트

- flume의 conf 디렉터리에서 test.conf 파일 생성

[bigdata@server01 conf]$ vi test.conf

- test.conf에 아래의 Agent 구성 입력

○ 플룸 Agent 구성 파일 설명

- Agent 이름: Test_Agent

- Test_Agent 구성: Source-Channel-Sink

Test_Agent.sources = TestSource_SpoolSource
Test_Agent.channels = TestChannel_Channel
Test_Agent.sinks = TestSink_LoggerSink

• 플럼의 에이전트에 사용할 Source, Channel, Sink의 각 리소스 변수 정의

• Test_Agent Source의 변수는 TestSource_SpoolSource

• Test_Agent Channel의 변수는 TestChannel_Channel

• Test_Agent Sink의 변수는 TestSink_LoggerSink

Test_Agent.sources.TestSource_SpoolSource.type = spooldir
Test_Agent.sources.TestSource_SpoolSource.spoolDir = /home/bigdata/working/batch-log
Test_Agent.sources.TestSource_SpoolSource.deletePolicy = immediate
Test_Agent.sources.TestSource_SpoolSource.batchSize = 1000

• Test_Agent의 Source 설정

• spooldir: 파일이 생성되면 그 파일의 내용을 수집

• “/home/bigdata/working/batch-log” 경로에서 생성되는 로그 파일 수집

• deletePolicy: 수집이 시작되면 수집하는 파일이 디렉터리(jbm-batch-log)에서 즉각 제거됨

• batchSize 설정값 만큼 읽어서 Channel에 전송

Test_Agent.channels.TestChannel_Channel.type = memory
Test_Agent.channels.TestChannel_Channel.capacity = 100000
Test_Agent.channels.TestChannel_Channel.transactionCapacity = 10000

• Test_Agent의 Channel 설정

• Channel의 종류를 “memory”로 설정

• Memory Channel: Source로부터 받은 데이터를 메모리상에 중간 적재. 성능은 높지만 메모리에 저장하는 거라 데이터가 유실될 위험이 있음

Test_Agent.sinks.TestSink_LoggerSink.type = logger

• Test_Agent의 Sink 설정

• Logger Sink: 테스트용으로 수집한 데이터를 플럼의 로그에 출력

Test_Agent.sources.TestSource_SpoolSource.channels = TestChannel_Channel
Test_Agent.sinks.TestSink_LoggerSink.channel = TestChannel_Channel

• Spooldir Source와 Memory Channel을 연결

• Logger Sink와 Memory Channel을 연결

- 작성후 플럼의 홈디렉터리(apache-flume-1.8.0-bin)로 이동

- Test_Agent 실행

flume-ng agent -c conf -f conf파일 -n Agent이름

[bigdata@server01 apache-flume-1.8.0-bin]$ ./bin/flume-ng agent -c conf -f conf/test.conf -n Test_Agent

- FileZilla를 통해 파일을 /home/bigdata/working/batch-log에 파일 업로드

- 파일 업로드와 동시에 Flume 수집기가 실행됨

• 플럼의 이벤트가 header와 body로 구성됨을 확인

• 데이터의 이동을 바로 확인 가능

'data engineering' 카테고리의 다른 글

Hbase 개요 (0)	2020.01.31
Apache Hive 개요 및 간단한 실습 (0)	2020.01.08
Flume 개요 (0)	2018.09.11
하둡 운영 (2)	2018.09.04
빅데이터 수집 (0)	2018.07.29

DB의 DB

Flume 설치 및 간단한 예제

'data engineering' 카테고리의 다른 글

티스토리툴바

Flume 설치 및 간단한 예제

'data engineering' 카테고리의 다른 글

'data engineering' Related Articles

티스토리툴바