Flume pro 2-6

#Flume pro 2.6 update
#Flume pro 2.6 mac

Responsible for architecting Hadoop clusters with CDH4 on CentOS, managing with Cloudera Manager.Involved in adding huge volumes of data in rows and columns to store data in HBase.Created HBase tables to store variable data formats of input data coming from different portfolios.

Responsible for loading customer's data and event logs into HBase using Java API.

Created HDFS Snapshots in order to do data backup, protection against user errors and disaster recovery.Įnvironment: Hadoop 2.4.x, HDFS, MapReduce 2.4.0, YARN 2.6.2, Pig 0.14.0, Hive 0.13.0, HBase 0.94.0, Sqoop 1.99.2, Flume 1.5.0, Oozie 4.0.0, Zookeeper 3.4.2, Cassandra, MongoDB, Spark1.1.1, Kafka 0.8.1.

Worked on Oozie workflow engine for job scheduling.

Applied MLlib to build statistical model to classify and predict.

Configured Spark to optimize data process.

Extracted files from Cassandra and MongoDB through Sqoop and placed in HDFS and processed.

#Flume pro 2.6 update

Stored and fast update data in Hbase, provided key based access to specific data.

Used Flume to collect, aggregate, and store dynamic web log data from different sources like web servers, mobile devices and pushed to HDFS.

Created multiple Hive tables, implemented partitioning, dynamic partitioning and buckets in Hive for efficient data access.

Used Pig UDFs to do data manipulation, transformations, joins and some pre-aggregations.

Wrote multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.

Configured Zookeeper, worked on Hadoop High Availability with Zookeeper failover controller, add support for scalable, fault-tolerant data solution.

Installed, configured, monitored and maintained Hadoop cluster on Big Data platform.

#Flume pro 2.6 mac

Operation Systems\ Methodologies: Linux (CentOS, Ubuntu), Mac OS, Windows\ Agile, Waterfall. Service Programming\ Tools: Zookeeper 3.3.6\ Eclipse, Git, Maven, Tableau\ Languages\ Scheduling: Java, Python, Scala, UNIX Shell Scripting, \ Oozie 4.0.x, Falcon\ SQL, C, C++ SQLOn: Hadoop\ Data Ingestion / ETL tools: Hive 0.12, Cloudera Impala 2.0.x\ Flume 1.3.x, Sqoop 1.4.4, Storm 0.9, Kafka 0.8. Relational Databases: Distribution based on Hadoop Oracle 11g/10g/9i/, MySQL 5.0, SQL Server\ Cloudera Distribution (CDH4, CM). Successfully working in fast-paced environment, both independently and in collaborative team environments.ĭistributed File System\ Distributed Programming: HDFS 2.6.0\ MapReduce 2.6.x, Pig 0.12, Spark 1.3.Hadoop Library\ NoSQL-DataBases Mahout, MLlib\ HBase 0.98, MongoDB, Cassandra.Knowledge of Social Network and Graph Theory.Fluent in Data Mining and Machine Learning, such as classification, clustering, regression and anomaly detection.Experienced in Agile and Waterfall methodologies.Presenting data in a visually appealing tool Tableau.Used Maven to achieve source building framework.In depth understand of Scalable Machine Learning libraries like Apache Mahout, MLlib.Consolidated MapReduce jobs by implemented Spark, decreased data processing time.Scheduled workflow using Oozie workflow Engine.Extracted data from log files and push into HDFS using Flume.Implemented Sqoop jobs for large sets of structured and semi-structured data migration between HDFS and/or other data storage like Hive or RDBMS.Used NoSQL Database including Hbase, MongoDB, Cassandra.

Experience in integration of various data sources in RDMS like Oracle, SQL Server.Developed real-time read write access to very large datasets via Hbase.

Wrote Ad - hoc queries for analyzing the data using HIVE QL.Extended Pig and Hive core functionality by writing custom UDFs.Experience in analyzing data using HiveQL, HBase and custom MapReduce programs in Java.Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop 2.x, MapReduce 2.x, HDFS, HBase, Oozie, Hive, Kafka, Oozie, Zookeeper, Spark, Storm, Sqoop and Flume.Excellent understanding of Hadoop architecture and various components such as HDFS, YARN, High Availability, and MapReduce programming paradigm.Worked in various domains including luxury, telecommunication.Over 5 years of working experience including 3+ years of experience in Hadoop Development along with 2+ years of experience in Data Analyst.