Giovanni Bricconi

My site on WordPress.com

Archive for February 2022

Learning Spark 2nd edition

leave a comment »

I was searching for a book on spark on O’Reilly site, I have found this one. Luckily the PDF is available online and you do not need to pay for it https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf

Written by Giovanni

February 28, 2022 at 2:52 pm

Posted in Varie

port forwarding and ubuntu firewall for hadoop

leave a comment »

I still needed to use firefox inside the vm to reach the uis in hadoop.

the problem was the ubuntu firewall

# ufw allow 50075
Rule added
# ufw allow 18080
Rule added
# ufw allow 50070
Rule added
# ufw allow 8042
Rule added
# ufw allow 8088
Rule added
# ufw allow 50090
Rule added
# ufw allow 4040
Rule added

Written by Giovanni

February 24, 2022 at 9:58 am

Posted in Varie

hadoop and spark on ubuntu

leave a comment »

Hi, I will start working on a big data project, I am setting up my environment.

As usual the versions to use are quite old, once you have a project running it is difficult to make the upgrades to the latest versions.

I am quite old, I did not think at using docker images to set up things, when I realized that I tried to find some hadoop images. Today with a quick google search, I did not find official images so I kept my environment set-up,

So an ubuntu server, that runs without ui

sudo systemctl set-default multi-user.target

this does the magic, then I set up the DISPLAY environment variable to have mobaxterm serve X11. In this way I can use intellij from the box but mixed with windows applications, The same for gedit etc.

Then I installed the glorious spark-2.1.0-bin-hadoop2.7 and hadoop-2.0.7. the set-up has taken a lot of time, luckily there are a lot of guides to do thing step by step.

This one is very nice https://phoenixnap.com/kb/install-hadoop-ubuntu

For spark I had to set up some env var concerning logs location, it has taken a while

Now the spark console spark-shell works and I can play a bit

there are a lot of useful ports to monitor the processes

Useful ports for hadoop and spark

NameNode: fs.defaultFS is hdfs://localhost:9000
namenode http://localhost:50070/dfshealth.html#tab-overview

secondary namenode http://0.0.0.0:50090

data node /0.0.0.0:50075

yarn resource manager port 8088

yarn Node manager http://localhost:8042/node

HistoryServer http://127.0.0.1:18080

spark shell http://127.0.0.1:4040

Written by Giovanni

February 23, 2022 at 3:32 pm

Posted in Varie

Scala

leave a comment »

Ho ricominciato a leggere scala for the impatient oggi lezione sugli array

Written by Giovanni

February 16, 2022 at 8:35 pm

Posted in Varie