Running Apache Hadoop on GCP (Page: 1)

Google Cloud Platform provides Cloud Dataproc service to run Apache Hadoop and Apache Spark jobs. In this tutorial I will show, how to set up a simple cluster. This job will run on few parallel nodes, provided by Google Cloud Platform. Later on I will use this cluster to demonstrate simple clustering calculations.

Setting up Hadoop Cluster on the Cloud

First of all we need to initialize a new cluster with Hadoop on GCP (Google Clout Platform). I’m assuming that you are already registered with GCP.

Create new project
New project

Create new project

1 – Click on project selection in the top left corner to call for project selection tool

2 – Click on “NEW PROJECT” to create new project

Select name for a new project
Project name

Select name for a new project

1 – Type the name of a new project (I’ve choose hadoop)

2 – Click “CREATE”

Select type of the new project
Type of the project

Select type of the new project

0 – Make sure that you are working with your current project

1 – On the left menu find the Dataproc option and

2 – Click on clusters – we will use clusters for our Hadoop project.

Select type of the new cluster
Type of the cluster

Select type of the new cluster

1 – If you are not redirected automatically, go to Google Cloud Platform, Select Dataproc in left menu and then click on Clusters

2 – Select Set up cluster

3 – Name of your cluster

4 – Choose region of your cluster. It is better to select something located near to you

5 – Zone – use any zone

6 – Select type of the cluster. Fro our purposes, we will use Standard: 1 master, N workers

Configure parameters of your nodes
Configure nodes

Configure parameters of your nodes

1 – Click on Configure nodes

2 – Choose smallest machine type 2vCPU, 7.5 GB memory - for the test task you don’t need to have any powerful options.

Configure parameters of the disks for nodes
Configure nodes disks

Configure parameters of the disks for nodes

1 – Select 30 GB for the Primary disk size for Master node. You need to use smallest disk size for the test task, but the 30GB is the smallest size for the disk with operational system.

2 – Select 30 GB for the primary disk size for Worker nodes. Again this is minimal possible size for disk space with operational system

3 – make sure that you have 2 nodes. Again for the test purposes 2 nodes is enough

4 – Click on create and wait some time for the GCP creating this cluster for you

After some time the cluster will be ready.
Cluster is ready

After some time the cluster will be ready.

Wait less than 1 min and the cluster will be ready for further usage

Setting up network for access

When cluster is ready, it is necessary to set up network to access and browse this cluster. A Virtual Private Cloud (VPC) network is a virtual version of a physical network, implemented inside of Google's production network

Configuring VPC network for cluster access.
Configure network

Configuring VPC network for cluster access.

1 – In the left menu, select VPC network from the NETWORKING section

2 – Click on VPC network

Configuring firewall for the instances access
Configuring firewall

Configuring firewall for the instances access

1 – Click on Firewall in the left menu

2 – Click on CREATE FIREWALL RULE at the top of the page

Important details about firewall configuration
Details of firewall configuration

Important details about firewall configuration

1 – This firewall will work for every instance in our system. Therefore we need to set up target to All instances in the network to have an access to everything within our cluster-clustering

2 – It is necessary to give IP address from wich you will access this cluster. You can use IPTOOL for checking your IP address. If you have dynamic IP – then contact with your provider and ask to fix it or ask for IP-range.

3 – Specify ports for access. It is necessary to give ports 8088 and 9870 together with option Specified protocols and ports

Check the firewall status
Firewall status

Check the firewall status

After all these steps you should have the information about your firewall and see that is it running.

Go to Page: 1; 2;


Published: 2021-11-24 06:10:16