8000 GitHub - taymourniazi/SHU: SAS, Python and R Codes for Sheffield Hallam University Classwork
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

taymourniazi/SHU

Repository files navigation

Sheffield Hallam University

Course: MSc Big Data Analytics

Objective ~ To share extensive knowledge learn from MSc Big Data Analytics modules

Interesting Projects

Google Cloud Platform

• Create and Connect to a Virtual Machine on google cloud platform.
• SSH client managed on a VM instance in google cloud compute engine.
• Install MySQL Database, Use and Test MySQL.
• Populate database with the clean copy of dataset prepared previously.

Microsoft Azure Cloud Services, Data Storage on Platform as a Service(PaaS)

Deploy a simple Messages application to Azure Cloud Services. Application stores simple textual messages and associated images to an Azure Storage Account. Messages application use both Tables Storage and Blob Storage.
• Create Storage Account, Cloud Service and deploy Messages application on Azure (PaaS).
• Run and Configure Messages Application, Investigate Azure Storage using SQL.

Azure SQL Database(PaaS)

• Deployment of AzureSQL DB Server(SaaS), configure firewall rules. Allow access from different IP address.
• Create SalesDB Database & Tables in SalesDB in Azure SQL DB Server, Bulk Upload Data to SalesDB.
• In Visual Studio Create more Tables in SalesDB, Populate Products and Sales Orders Tables.

Azure VM & SQL Server(SaaS) – Security Issues

• Run the Local Client - VMware Virtual Machine, Access MS Azure Portal, Create Storage Account
• 8000 Create a Windows Virtual Machine with SQL Server, Install Security Certificates, Install Client VPN Package
• Remote Desktop Connection to Azure VM, Create New Database via VM Desktop
• Set Up Remote Connection to SQL Server on VM, Import Data into SQL Server from Local Client

Cassandra on Virtual Machines (OS- Ubuntu) – Flights Dataset

• On VMware create two virtual machines running the Ubuntu operating system and install Cassandra.
• Configured to create a Cassandra Cluster on Each VM. Using the Cassandra database system.
• Use one of the Cassandra nodes to create keyspaces and tables, populating tables and querying data.
• Use CQL statements to Create, Retrieve, Update and Delete data.

Neo4j’s Graph database (Cypher Query Language) – Movies & NorthWind Dataset

• A relational database is imported into Neo4j and converted to a graph database.
• Using Neo4j’s Cypher query language
• Copy and run the Cloud Data VM in VMware and connect it to Secure Shell PuTTY.
• Configuration changes to Neo4j database system using Neo4j Shell user interface.

MongoDB database (SQL)

• Using Cloud Data Ubuntu (Linux) virtual machine where Mongo DB is already installed.
• Running Cloud Data VM as a MongoDB server in VMware Workstation.
• Interacting with MongoDB using PuTTY Secure Shell (SSH) client and MongoDB Shell
• CRUD Operations in MongoDB, Moving bulk data into Mongo, MongoDB Compass

Oracle APEX – GUI Environment (SQL Plus) – Performance Tuning

• Performance Tuning & Optimization, Physical layers, such as disks and RAM
• Operating System (OS), Database Server processes, Data types, location and volumes
• SQL optimization, Relationships and sequences set up

Final Assignment - Oracle APEX – GUI Environment (SQL Plus)

• Act as a small team working for a Software House
• Sell applications which use MySQL, Oracle and Microsoft SQL Server as the back-ends
• Move from using on-premises databases to Cloud databases and whether to continue using SQL databases or to switch to using NoSQL systems.

Acquiring, Manipulating Data in Hadoop, Hive, Pig (Hue) - Wordcount program

• Run the Hortonworks Sandbox on CentrOS, Linux machine in VMware 14, copying VM image.
• Using PuTTY to interact with the Sandbox, run Hadoop and pass MapReduce.Jar file for word count.
• Call the Hadoop executable, and execute dataset stored in HDFS.

HCatalog, Hive and Pig – World Airports Dataset

• Translate HiveQL statements into MapReduce jobs, submitted to Hadoop for execution.
• Using CSV and Json Files for loading, sorting, joining aggregating data – Task repetition
• Produce a sequence of MapReduce jobs for the manipulation and analysis of large datasets in Pig.

Microsoft Azure -- A managed Hadoop environment in the cloud

• Azure HDInsight
• Configuring SQL Server 2012 Virtual Machine in Azure including MS Excel with Hive ODBC Plug-in
• Provisioning an HDInsight Hadoop Cluster on Azure, Connecting to your Remote Desktop

Running, Modifying Wordcount program on Cloudera Hadoop

• Python (v2) and JAVA MapReduce wordcount
• Connect with HDInsight via PUTTY to run the Python wordcount

Apache Sqoop - Sharing Data between Hadoop and other databases

• Data Acquisition - Importing and exporting data between Hadoop and MySQL using Sqoop.
• Multiple tables and selected row to shared between HDFS and MySQL

Apache Nifi – Converting CSV file to Avro, JSON Format to ingest in NoSQL database

• DataFlow - Transforming Different Data Sources in JSON Format
• Managing the flow of data from different sources, to and from HDFS.
• Process include GetFile-InferAvroSchema-ConvertCSVToAvro-ConvertAvroToJSON-Save the JSON
• Using Nifi for data acquisition from twitter

Apache Sqoop - Sharing Data between Hadoop and other databases

• Data Acquisition - Importing and exporting data between Hadoop and MySQL using Sqoop.
• Multiple tables and selected row to share between HDFS and MySQL

Oracle Apex - Loading data into Oracle Apex from Tab Separated Value file

• Techniques Used for data validation and cleansing patterns, Automating data cleansing in Excel.
• Uploading tsv file into Oracle Application Express.

Hadoop Using PIG

• Using a pre-configured virtual machine with Microsoft Azure HDInsight (Hadoop 2.7.3)
• To access the VM via SSH using an SSH client application – MobaXterm
• Write a Pig Latin script to handle the word count (text file in HDFS)
• Inspect the results via the Web UI - Ambari, NLP Using Pig with python UDF’s

About

SAS, Python and R Codes for Sheffield Hallam University Classwork

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0