8000 GitHub - stdatalabs/inverted-index: An implementation of inverted index in Mapreduce and Spark
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

stdatalabs/inverted-index

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MapReduce VS Spark - Inverted Index Example

Comparing MapReduce to Spark using Inverted Index example.

Requirements

  • IDE
  • Apache Maven 3.x
  • JVM 6 or 7

General Info

The repository contains both MapReduce and Spark projects MRInvertedIndex and SparkInvertedIndex

  • com/stdatalabs/SparkInvertedIndex
    • Driver.scala -- Spark code to build inverted index
  • com/stdatalabs/MRInvertedIndex
    • InvertedIndexMapper.java -- Reads files in input directory and outputs (word, filename) as key-value pair
    • InvertedIndexReducer.java -- Reads the list of (word, firstnames) key-value pair and outputs (word, (filename, count))
    • InvertedIndexDriver.java -- Driver program for MapReduce jobs

Description

More articles on hadoop technology stack at stdatalabs

About

An implementation of inverted index in Mapreduce and Spark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0