-
Sessionize the web log by IP. Sessionize = aggregrate all page hits by visitor/IP during a session. https://en.wikipedia.org/wiki/Session_(web_analytics)
-
Determine the average session time
-
Determine unique URL visits per session. To clarify, count a hit to a unique URL only once per session.
-
Find the most engaged users, ie the IPs with the longest session times
-
Predict the expected load (requests/second) in the next minute
-
Predict the session length for a given IP
-
Predict the number of unique URL visits by a given IP
HDP Sandbox: http://hortonworks.com/hdp/downloads/ or CDH QuickStart VM: http://www.cloudera.com/content/cloudera/en/downloads.html http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/access-log-collection.html#access-log-entry-format