Load data from CSV file to train model
Based on learning use trained model to predict the output.
Data used here is from https://www.kaggle.com/c/titanic
For running Spark from Eclipse it is required that we set following VM argument.
-Xmx512m
Some important tips are from Kaggle forums.
Based on learning use trained model to predict the output.
Data used here is from https://www.kaggle.com/c/titanic
For running Spark from Eclipse it is required that we set following VM argument.
-Xmx512m
Following is source code.
Note that if Apache Hadoop is not installed on local machine then just download binaries and set system property hadoop.home.dir. If you are running stand alone code and do not want to hardcode set the proprety using -D option.