⬆️ ⬇️

Home BigData. Part 1. Practice Spark Streaming on an AWS Cluster

Hello.



There are many services on the Internet that provide cloud services. With their help, you can master the technology BigData.



In this article, we will install Apache Kafka, Apache Spark, Zookeeper, Spark-shell on EC2 AWS (Amazon Web Services) platform at home and learn how to use it all.

')

image




Introduction to the Amazon Web Services Platform



Under the link aws.amazon.com/console you have to register. Enter the name and remember the password.



Configure node instances for Zookeeper and Kafka services.





Creating keys





Cluster startup



For convenience, rename the cluster nodes to Node01-04 notation. To connect to the cluster nodes from the local computer via SSH, you must determine the node's IP address and its public / private DNS name, select each cluster node in turn, and for the selected instance write down its public / private DNS name for connection via SSH and for installation Software to text file HadoopAdm01.txt.



Example: ec2-35-162-169-76.us-west-2.compute.amazonaws.com



Installing Apache Kafka in SingleNode Mode on an AWS Cluster Node



To install the software, select our node (copy its Public DNS) to connect via SSH. We configure connection through SSH. We use the saved first node name to configure the connection via SSH using the Private / Public key pair “HadoopUser01.ppk” created in clause 1.3. Go to the Connection / Auth section using the Browse button and look for the folder where we previously saved the “HadoopUser.ppk” file.



Save the connection configuration in the settings.



Connect to the site and use login: ubuntu.





First joy



Create your first topic on the collected server kafka.





Configuring Apache Spark on a single-node cluster



We prepared an instance of the node with the Zookeeper and Kafka service installed on AWS, now you need to install Apache Spark, for this:



Download the latest version of the Apache Spark distribution.



 wget https://archive.apache.org/dist/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.6.tgz 






A bit of creativity



Download the editor Scala-IDE (on the link scala-ide.org ). We start and start writing code. Here I will not repeat, since there is a good article on Habré .



Help useful literature and courses:



courses.hadoopinrealworld.com/courses/enrolled/319237

data-flair.training/blogs/kafka-consumer

www.udemy.com/apache-spark-with-scala-hands-on-with-big-data

Source: https://habr.com/ru/post/452752/



All Articles