Sheffield Hallam University

Course: MSc Big Data Analytics

Objective ~ To share extensive knowledge learn from MSc Big Data Analytics modules

Interesting Projects

Google Cloud Platform

• Create and Connect to a Virtual Machine on google cloud platform.
• SSH client managed on a VM instance in google cloud compute engine.
• Install MySQL Database, Use and Test MySQL.
• Populate database with the clean copy of dataset prepared previously.

Microsoft Azure Cloud Services, Data Storage on Platform as a Service(PaaS)

Deploy a simple Messages application to Azure Cloud Services. Application stores simple textual messages and associated images to an Azure Storage Account. Messages application use both Tables Storage and Blob Storage.
• Create Storage Account, Cloud Service and deploy Messages application on Azure (PaaS).
• Run and Configure Messages Application, Investigate Azure Storage using SQL.

Azure SQL Database(PaaS)

• Deployment of AzureSQL DB Server(SaaS), configure firewall rules. Allow access from different IP address.
• Create SalesDB Database & Tables in SalesDB in Azure SQL DB Server, Bulk Upload Data to SalesDB.
• In Visual Studio Create more Tables in SalesDB, Populate Products and Sales Orders Tables.

Azure VM & SQL Server(SaaS) – Security Issues

• Run the Local Client - VMware Virtual Machine, Access MS Azure Portal, Create Storage Account
• 8000 Create a Windows Virtual Machine with SQL Server, Install Security Certificates, Install Client VPN Package
• Remote Desktop Connection to Azure VM, Create New Database via VM Desktop
• Set Up Remote Connection to SQL Server on VM, Import Data into SQL Server from Local Client

Cassandra on Virtual Machines (OS- Ubuntu) – Flights Dataset

• On VMware create two virtual machines running the Ubuntu operating system and install Cassandra.
• Configured to create a Cassandra Cluster on Each VM. Using the Cassandra database system.
• Use one of the Cassandra nodes to create keyspaces and tables, populating tables and querying data.
• Use CQL statements to Create, Retrieve, Update and Delete data.

Neo4j’s Graph database (Cypher Query Language) – Movies & NorthWind Dataset

• A relational database is imported into Neo4j and converted to a graph database.
• Using Neo4j’s Cypher query language
• Copy and run the Cloud Data VM in VMware and connect it to Secure Shell PuTTY.
• Configuration changes to Neo4j database system using Neo4j Shell user interface.

MongoDB database (SQL)

• Using Cloud Data Ubuntu (Linux) virtual machine where Mongo DB is already installed.
• Running Cloud Data VM as a MongoDB server in VMware Workstation.
• Interacting with MongoDB using PuTTY Secure Shell (SSH) client and MongoDB Shell
• CRUD Operations in MongoDB, Moving bulk data into Mongo, MongoDB Compass

Oracle APEX – GUI Environment (SQL Plus) – Performance Tuning

• Performance Tuning & Optimization, Physical layers, such as disks and RAM
• Operating System (OS), Database Server processes, Data types, location and volumes
• SQL optimization, Relationships and sequences set up

Final Assignment - Oracle APEX – GUI Environment (SQL Plus)

• Act as a small team working for a Software House
• Sell applications which use MySQL, Oracle and Microsoft SQL Server as the back-ends
• Move from using on-premises databases to Cloud databases and whether to continue using SQL databases or to switch to using NoSQL systems.

Acquiring, Manipulating Data in Hadoop, Hive, Pig (Hue) - Wordcount program

• Run the Hortonworks Sandbox on CentrOS, Linux machine in VMware 14, copying VM image.
• Using PuTTY to interact with the Sandbox, run Hadoop and pass MapReduce.Jar file for word count.
• Call the Hadoop executable, and execute dataset stored in HDFS.

HCatalog, Hive and Pig – World Airports Dataset

• Translate HiveQL statements into MapReduce jobs, submitted to Hadoop for execution.
• Using CSV and Json Files for loading, sorting, joining aggregating data – Task repetition
• Produce a sequence of MapReduce jobs for the manipulation and analysis of large datasets in Pig.

Microsoft Azure -- A managed Hadoop environment in the cloud

• Azure HDInsight
• Configuring SQL Server 2012 Virtual Machine in Azure including MS Excel with Hive ODBC Plug-in
• Provisioning an HDInsight Hadoop Cluster on Azure, Connecting to your Remote Desktop

Running, Modifying Wordcount program on Cloudera Hadoop

• Python (v2) and JAVA MapReduce wordcount
• Connect with HDInsight via PUTTY to run the Python wordcount

Apache Sqoop - Sharing Data between Hadoop and other databases

• Data Acquisition - Importing and exporting data between Hadoop and MySQL using Sqoop.
• Multiple tables and selected row to shared between HDFS and MySQL

Apache Nifi – Converting CSV file to Avro, JSON Format to ingest in NoSQL database

• DataFlow - Transforming Different Data Sources in JSON Format
• Managing the flow of data from different sources, to and from HDFS.
• Process include GetFile-InferAvroSchema-ConvertCSVToAvro-ConvertAvroToJSON-Save the JSON
• Using Nifi for data acquisition from twitter

Apache Sqoop - Sharing Data between Hadoop and other databases

• Data Acquisition - Importing and exporting data between Hadoop and MySQL using Sqoop.
• Multiple tables and selected row to share between HDFS and MySQL

Oracle Apex - Loading data into Oracle Apex from Tab Separated Value file

• Techniques Used for data validation and cleansing patterns, Automating data cleansing in Excel.
• Uploading tsv file into Oracle Application Express.

Hadoop Using PIG

• Using a pre-configured virtual machine with Microsoft Azure HDInsight (Hadoop 2.7.3)
• To access the VM via SSH using an SSH client application – MobaXterm
• Write a Pig Latin script to handle the word count (text file in HDFS)
• Inspect the results via the Web UI - Ambari, NLP Using Pig with python UDF’s

Name		Name	Last commit message	Last commit date
Latest commit History 177 Commits
Advance Statistical Modelling		Advance Statistical Modelling
Big Data - Distributed Systems		Big Data - Distributed Systems
Data Integration - Quality		Data Integration - Quality
Data Mining		Data Mining
Handling Data In Cloud		Handling Data In Cloud
Statistical Modelling		Statistical Modelling
README.md		README.md
flights.dat		flights.dat
patients.dat		patients.dat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sheffield Hallam University

Course: MSc Big Data Analytics

Objective ~ To share extensive knowledge learn from MSc Big Data Analytics modules

Interesting Projects

Google Cloud Platform

Microsoft Azure Cloud Services, Data Storage on Platform as a Service(PaaS)

Azure SQL Database(PaaS)

Azure VM & SQL Server(SaaS) – Security Issues

Cassandra on Virtual Machines (OS- Ubuntu) – Flights Dataset

Neo4j’s Graph database (Cypher Query Language) – Movies & NorthWind Dataset

MongoDB database (SQL)

Oracle APEX – GUI Environment (SQL Plus) – Performance Tuning

Final Assignment - Oracle APEX – GUI Environment (SQL Plus)

Acquiring, Manipulating Data in Hadoop, Hive, Pig (Hue) - Wordcount program

HCatalog, Hive and Pig – World Airports Dataset

Microsoft Azure -- A managed Hadoop environment in the cloud

Running, Modifying Wordcount program on Cloudera Hadoop

Apache Sqoop - Sharing Data between Hadoop and other databases

Apache Nifi – Converting CSV file to Avro, JSON Format to ingest in NoSQL database

Apache Sqoop - Sharing Data between Hadoop and other databases

Oracle Apex - Loading data into Oracle Apex from Tab Separated Value file

Hadoop Using PIG

About

Uh oh!

Releases

Packages

Uh oh!

Languages

taymourniazi/SHU

Folders and files

Latest commit

History

Repository files navigation

Sheffield Hallam University

Course: MSc Big Data Analytics

Objective ~ To share extensive knowledge learn from MSc Big Data Analytics modules

Interesting Projects

Google Cloud Platform

Microsoft Azure Cloud Services, Data Storage on Platform as a Service(PaaS)

Azure SQL Database(PaaS)

Azure VM & SQL Server(SaaS) – Security Issues

Cassandra on Virtual Machines (OS- Ubuntu) – Flights Dataset

Neo4j’s Graph database (Cypher Query Language) – Movies & NorthWind Dataset

MongoDB database (SQL)

Oracle APEX – GUI Environment (SQL Plus) – Performance Tuning

Final Assignment - Oracle APEX – GUI Environment (SQL Plus)

Acquiring, Manipulating Data in Hadoop, Hive, Pig (Hue) - Wordcount program

HCatalog, Hive and Pig – World Airports Dataset

Microsoft Azure -- A managed Hadoop environment in the cloud

Running, Modifying Wordcount program on Cloudera Hadoop

Apache Sqoop - Sharing Data between Hadoop and other databases

Apache Nifi – Converting CSV file to Avro, JSON Format to ingest in NoSQL database

Apache Sqoop - Sharing Data between Hadoop and other databases

Oracle Apex - Loading data into Oracle Apex from Tab Separated Value file

Hadoop Using PIG

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages