Amazon Reviews Sentiment Analysis

Amazon Reviews Sentiment Analysis

A real-time sentiment analysis pipeline for Amazon product reviews using Apache Kafka for data streaming, Apache Spark Streaming for processing and prediction, and MongoDB for storing enriched results.

Apache Kafka Apache Spark MongoDB Python Docker Machine Learning
View on GitHub

Architecture Overview

Key Features

Spark Streaming Consumer (excerpt)

python
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils

ssc = StreamingContext(sc, batchDuration=5)
kafka_stream = KafkaUtils.createDirectStream(
    ssc, topics=['amazon-reviews'],
    kafkaParams={'metadata.broker.list': 'localhost:9092'}
)

predictions = kafka_stream.map(predict_sentiment)
predictions.foreachRDD(save_to_mongodb)
ssc.start()

References & Links