Building a PC – Part 0

The first step in building a PC is deciding what kind of computer you actually need. This is why I have labeled this as Part 0 (Zero) instead of Part 1, as it really is a pre-step in your Custom PC Build Project.

For this post we will talk about the following PC Build Types:

  • Budget PC
  • Basic Desktop
  • Workstation / Power User Desktop
  • Gaming PC
  • Home Server

The first type of of Custom PC Build is the “Budget PC”. It is a very basic PC that is mostly used as an Internet Browsing PC, although with the level of hardware available today even a Budget PC can do basic School (Primary School tasks should be fine, however some more advanced High School tasks might be slow to do) and Home office tasks and probably be OK to use as a “Work from Home” PC if your Internet is fast enough. It should be able to stream 1080p video but it would struggle to keep up with 4K videos and video conferences probably will look very fuzzy and choppy. Specs in 2020 for a Budget PC are not as bad as they were in the past. You are probably looking at a 4-core / 4-thread CPU (possibly a 2-core CPU, but they are getting more difficult to find, and really not worth the money anymore), and somewhere between 4GB and 8GB of RAM (a Store Bought Budget PC will probably have between 4GB and 6GB, but the cost of 6GB vs 8GB is so small a difference in a custom build even in the budget range you will probably opt for 8GB if you go for more than 4GB). You probably will only have 1 storage device so it probably will be a hard drive (probably between 1TB and 2TB) not an SSD, although you could get a 32GB – 128GB SSD for under $100 these days if you look, good enough to boot from, but again since we are talking about building a budget PC, lets assume SSD is off the table, but in the future could be a nice upgrade. A budget PC will definitely not have a discrete graphics card so you will have whatever Integrated Graphics comes with the CPU you buy.

The second type of Custom PC Build is the “Basic Desktop”. It is your everyday home PC that can be used to Browse the Internet, Watch Videos, Stream at 4K (if your Internet Connection Speed is fast enough), participate in Video Conference and calls, Edit Photos, and do basic School (both Primary and Secondary, and probably most basic college projects that don’t require any specialized software) and Home Office tasks like using Microsoft Office to edit documents, spreadsheets and presentations or to use as a “Work from Home” computer. It’s a relatively powerful computer, can be used for basic software development and some basic video editing, but you probably won’t do heavy video editing or play the latest games on this type of computer. It is one step above a “Budget PC Build”, but this kind of PC will definitely last you quite a bit longer as it should be able to keep up with the times better than a Budget PC. In 2020 a Basic Desktop will have between 8 to 16GB or RAM (with 16GB recommending by the Author), a SSD (between 256GB and 512GB) and possibly a second Hard Drive for additional storage (between 2TB and 4TB). The CPU will have between 4 and 8 cores and 8 to 16 threads (the author recommends at least 6-cores), and it may or may not have a discrete graphics card (if a discrete graphics card is used probably at previous generation GPU with 4GB to 8GB of VRAM).

The third type of Custom PC Build is the “Workstation” or “Power User Desktop”. It is a very powerful desktop computer for the home, that is on par or better than Workstations used in most large corporations. It can handle 3D CAD, heavy video editing and rendering, be used for advanced software development, and pretty much handle anything you can throw at it, including play most of the new games on the market. The biggest difference between a Workstation and a Gaming PC will be how much you spend on the graphics card (you are still going to get a relatively power graphics card for rendering and other GPU tasks, but it might not need to be the latest and greatest card on the market) and the “coolness” factor. The Workstation in 2020 will have a lot of RAM at least 32GB, multiple SSDs and secondary Hard Drives, at least 8-cores and 16-threads, plus a Graphics Card with at least 8GB of VRAM.

The fourth type of Custom PC Build is the Gaming PC. It is basically a workstation build but has the most powerful graphics card you can budget for and it will have the coolness factor added in. Most probably you will want to go for a Case with a Glass or Clear Plastic side and the cables, fans, and other parts will have multi-colored LEDs, some even have mini-LCDs integrated on the motherboard for effects. Water-Cooling will be used most probably, because gamers like to overclock their CPUs, RAM, and Graphics Cards. If you aren’t going to overclock, water-cooling is still typically used for gaming PCs for the coolness factor as the tubing and other components are usually translucent and lit up for effect, although if you aren’t going to overclock, water-cooling is not necessarily required, even basic overclocking can be achieved with passing cooling given enough fans and the right CPU Cooler (heatsink).

The fifth and final type of Custom PC Build is the Home Server. Here you want to get a Server level motherboard used for small businesses or an AMD Threadripper, and you want to spend more money RAM, CPU, and Storage, and not so much on the graphics. In 2020, we are talking about specs like a CPU with at least 16 Cores and 32 threads (although the author recommends a 32-core 64-thread CPU) and somewhere between 64GB and 256GB of RAM (the author recommends between 126GB and 256GB so you can run multiple VMs simultaneously). You will be running either Microsoft Windows Server (Microsoft offers a version called “Essentials” which is cheap enough and powerful enough for a Home setup) or you can run a Linux Server installation. With this type of hardware you can run a Hypervisor and run both Windows Server and a Linux Server simultaneously, again the reason for maxing out the RAM and giving it as many CPU cores you can afford. This kind of setup you want to consider multiple physical network cards one for each OS you will be running, a very good power supply, and top of the line fans, because it will be running 24/7 as your home server. I do not recommend water cooling for this same reason, I don’t trust water-cooling solution for 24/7 machines. You don’t want it leaking and destroying your server in the middle of the night.

The next parts in this blog series will discuss building a “Basic Desktop”, as this is the most common setup you will find in most people’s homes.

Posted in Technology | Leave a comment

Caesar Cipher in C

A simple C implementation of the Caesar Cipher. Supports full wrap around for Alpha Numerics. Does Modulus for shifts larger than 26 for Alphas and 10 for Digits.

Background on Caesar Cipher: https://en.wikipedia.org/wiki/Caesar_cipher

/*
 Copyright 2019 Robert C. Ilardi

 Licensed under the Apache License, Version 2.0 (the "License");
 you may not use this file except in compliance with the License.
 You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
*/

//To Compile: gcc CaesarCipher.c -o CaesarCipher

/*
 Usage: ./CaesarCipher [SHIFT] [MESSAGE]

 A nice generic Caesar Cipher C program I just wrote
 for the hell of it.

 Use negative numbers to reverse the shifting.
*/

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

typedef unsigned int bool;

const bool TRUE=1;
const bool FALSE=0;

int main(int argc, char* argv[])
{
	int shift, tmpShift;
	char *mesg;
	char *outputMesg;
	char ch;
	bool isAlpha, isUpper, isLower, isDigit;

	if (argc!=3)
	{
		fprintf(stderr, "Usage: %s [SHIFT] [MESSAGE]\n", argv[0]);
		return EXIT_FAILURE;
	}
	
	shift=atoi(argv[1]);
	mesg=argv[2];

	printf("Shift Amount: %d\n", shift);
	printf("Message: %s\n", mesg);

	outputMesg = (char*)malloc(strlen(mesg)+1);

	for (int i=0; i<strlen(mesg); i++)
	{
		ch=mesg[i];
		isUpper = (ch >= 'A' &amp;&amp; ch <='Z');
		isLower = (ch >= 'a' &amp;&amp; ch <='z');
		isAlpha = ((ch >= 'A' &amp;&amp; ch <='Z') || (ch >= 'a' &amp;&amp; ch <='z'));
		isDigit = (ch >= '0' &amp;&amp; ch <= '9');

		if (isAlpha)
		{
			if (shift > 0)
			{
				if (shift>26)
				{
					tmpShift=shift % 26;
				}
				else
				{
					tmpShift=shift;
				}

				if (isUpper &amp;&amp; (ch + tmpShift) > 'Z')
				{
					ch = 'A' + ((tmpShift - ('Z' - ch)) - 1);
				}
				else if (isLower &amp;&amp; (ch + tmpShift) > 'z')
				{
					ch = 'a' + ((tmpShift - ('z' - ch)) - 1);
				}
				else
				{
					ch += tmpShift;
				}
			}
			else if (shift < 0)
			{
				if (abs(shift)>26)
				{
					tmpShift=shift % 26;
				}
				else
				{
					tmpShift=shift;
				}

				if (isUpper &amp;&amp; (ch + tmpShift) < 'A')
				{
					ch = 'Z' - (('A' - ch) + abs(tmpShift) - 1);
				}
				else if (isLower &amp;&amp; (ch + tmpShift) < 'a')
				{
					ch = 'z' - (('a' - ch) + abs(tmpShift) - 1);
				}
				else
				{
					 ch += tmpShift;
				}
			}
		}
		else if (isDigit)
		{
			if (shift>0)
			{
				if (shift>10)
				{
					tmpShift=shift % 10;
				}
				else
				{
					tmpShift=shift;
				}

				if ((ch + tmpShift) > '9')
				{
					ch = '0' + ((tmpShift - ('9' - ch)) - 1);
				}
				else
				{
					ch += tmpShift;
				}
			}
			else if (shift<0)
			{
				if (abs(shift)>10)
				{
					tmpShift=shift % 10;
				}
				else
				{
					tmpShift=shift;
				}

				if ((ch + tmpShift) < '0')
				{
					ch = '9' - (('0' - ch) + abs(tmpShift) - 1);
				}
				else
				{
					ch += tmpShift;
				}
			}
		}

		outputMesg[i]=ch;
	}

	outputMesg[strlen(outputMesg)]='\0';

	printf("Output Message: %s\n", outputMesg);

	free(outputMesg);

	return EXIT_SUCCESS;
}

Posted in Computer Fun Stuff, Development, Programming General, Security, Technology | Leave a comment

As a red blooded American Capitalist why read Karl Marx?

So some of you may be asking yourself why I have read Marx and why am I even willing to go see his statue while I’m in town in Berlin, the answer is simple: “Know thy self, know thy enemy. A thousand battles, a thousand victories.” – Sun Tzu; The Art of War. And yes Socialism in all its forms is my enemy. Why because socialism is the enemy of individual freedom by its very definition. And an enemy of freedom is also my enemy…

Know thy Enemy…

~Robert; Germany, July 2018

Posted in Philosophy, Politics, Society | Leave a comment

We Need An Internet First Amendment NOW!

Let’s get the Hashtag: #InternetFirstAmendment trending!

Today, Alex Jones’ InfoWars was removed from the social media platforms: Facebook, Spotify, YouTube, and Apple iTunes PodCasts. This is a major attack on the freedom of expression on today’s Internet.

You don’t need to like Alex Jone’s or agree with him, but he does have the right to his opinions and he should have the protected right to say what he wishes on the Internet.

It should be up to the consumer to “change the channel” if they don’t like what his brand of content says. We cannot give up our right to share ideas and read other people’s ideas to a handful of big tech firms. The Internet has become the new Public Forum, and therefore our speech and writing on the Internet needs to be protected by the First Amendment.

When I started creating content on the Internet back in 1994, I was in high school and was creating HTML pages by hand on a web hosting platform was known as GeoCities.

On Today’s Internet, users do not need to know how to code even a simple markup language like HTML, and instead can use “Social Media” tools like Facebook and Twitter, or post Videos to YouTube and other video hosting sites.

It has become easier than ever to use the Internet to share our ideas and for the most of us, at least in the Western world, the majority of our daily communications is now done on the Internet, and usually it is made via a couple of dozen web sites at most.

Internet Censorship is on the rise, and we need to put a stop to it once an for all.

I am a capitalist through and through, but more than that I’m an American and somewhat of a Constitutionalist, especially when it comes to our rights, like Freedom of Speech and Freedom of the Press.

I know the First Amendment is meant to protect speech in the public square, however the new public square are these Social Media sites.

So I’m asking everyone who believes in their own right to their ability to share your thoughts, your passions, and your opinions, to start making calls and writing to your Senators and your Representatives, at the Federal level, but also at the State less as well. Please ask them to work towards a bill and hopefully an Amendment that basically says if a Technology Company, Web Hosting Service, Domain Registrar, Social Media Platform, and Media Sharing Platform like YouTube and Apple’s iTunes, as well as Search Engines like Google and Bing, that if they want to continue to operate within the United States, they MUST respect the first amendment.

We aren’t talking about making private companies government owned, but just like any other telecommunications company like your Land Line Phone Company, Cellular Phone Company, your Cable Company, and Broadcast Radio and TV, among others, they need to be regulated to prevent them from removing anyone’s content.

Let it fall to the realm of the US Courts to determine if someone’s account violates an actual Law. This way everyone’s Due Process Rights are protected, and everyone’s rights to a fair and free Public Forum on the Internet are protected.

Level’s of Internet Censorship Slide:

 

 

Posted in Politics, Society, Technology | Leave a comment

Red Tide: Prologue

The Democrat Party in the United States around the year 2045 was in shambles; a mere shadow of its former self. After approximately 4 major US Senate Election Cycles and 7 US Presidential Election Cycles the Democratic Party only continued to fracture between what in 2018 was known as the Neo-Liberals and the “Alt-Left” (also known as the Far Left in certain circles). The DSA (Democratic Socialists of America) continued to gain traction among the Alt-Left but have not been able to gain more than a handful of seats in both houses of congress and never had a truly viable contender for the Presidency. However the Alt-Left on both coasts of the United States and a handful of other states where liberals, in overwhelming numbers have migrated to, brining their socialist ideas along with them, have becoming ever more disgruntled with their battle to try to wage a political war using the System. There have been cries by the Alt-Left since before President Donald J. Trump’s first election win in November 2016 that the “System” is too broken to fix from within, and protesters and the far-left have said the “Only Solution is Revolution”, and that call has grown only stronger in the past 29 years since that monumental election of 2016. While the policies President Trump, the Republicans, and those few Democrats that decided to break with party lines, reaching across the aisle joining the new Renaissance in America, creating a very powerful Economy for the American People. The picture was not as rosy for the rest of the world, and where ever Socialism gained power, those nations fell into Economic and Social decay. Still with all this proof, both at home and abroad, the Alt-Left ignored the prosperity they saw around them, especially in the more classically liberal and therefore libertarian States within the Union. The arrogance that the “Alt-Left” and all other socialists (coastal elites) “knew” what’s better for our country, and somehow the “corrupt” system of Capitalism, at least in their view was somehow keeping itself afloat by the top 5% of incoming earners pushing down on the “bottom” 95%. What started with a fight against the 1% by Occupy in 2011, slowly turned into the “other 98%” then into the fight against the 5%. Some of us knew this was to be inevitable, and the socialists try to push for a fight against the top 10% but face resistance at this level, but were able to sustain somewhat of a political battle against the top 5% in the United States, but starting and continuing to wage a Class War in the Alt-Left Strongholds like California and New York. While the country seemed united under economic prosperity, dissent, disdain, and even hatred continued to grow in these Socialist States, brought by a sense that Socialism will never take hold within the United States at a Federal Level. By 2045 this hatred from the Socialist States on the coasts for the Capitalist Libertarian States in the rest of the country reach a fever pitch. Movements like Calexit were coming up in political debates and major protests, sometimes violent, across all the States where Socialism took hold of their major cities; as it was only in the cities where lack of true freedom of the individual and instead Groupthink prevailed…

Posted in Red Tide | Leave a comment

What is Red Tide?

Red Tide is a Fictional Universe, I’m creating based on this premise: The year the main story starts in is 2045. Since the first election of Donald J. Trump, America as enjoyed a very strong economy with lower taxes and deregulation. For the more libertarian or “classical liberal” States within the Union, prosperity was clearly visible, while States especially on the two coasts that embraced a more Socialist attitude, economy and social decay advanced. Within the 29 years between 2016 and 2045, the Alt-Left has continued to become disenfranchised with the American System of Government, because although with all their work trying to gain political power by using our free elections, they only managed to gain a handful of seats in either of the Congressional Houses and they never put forth a viable candidate for the Presidency. So the Alt-Left turned more inward continuing to espouse their slogan they have used since 2016 that “The only Solution is Revolution”.

The Red Tide fictional universe is about how the Alt-Left destroyed the Democratic Party in the United States and follows their attempt to tear apart our great nation from within; attempting to replace our legitimate elected Government with a Communist Regime using subversive social tactics to rile up their small base of Socialist to lead a legion of made up of students and illegal migrants who have been brainwashed by the ever growing and dangerous group of Marxist Radicals within the Public K-12 Education System, the University System, and the Far-Left Leaning State Governments (Coastal Elites) within the United States.

This story is supposed to be controversial. It’s supposed to make you think. And my hope in the end is that it helps to establish a dialog on why Socialism in all it’s forms is inherently against our basic human rights to Life, Liberty, and the Pursuit of Happiness.

Posted in Red Tide | Leave a comment

Calculating the value of Phi aka the Golden Ratio in Perl

I was watching a video by 3blue1brown and decided to code up the Continued Fraction of “phi” aka “The Golden Ratio“. It’s pretty simple, it’s basically a recursive function (Here’s a link to the Mathematical Definition of a Recursion and the Programming Definition of Recursion) of 1 + 1/x. But you can code it in a loop. I did a quick Perl Script using an infinite loop. You first hit the golden ratio at 9 iterations, and at 11 iterations of the loop the value becomes metastable at a precision of 6. Here’s the script and output:

Perl Script to calculate Phi using the Continued Fraction (a function call in a loop here):

Standard Output paused at 21 iterations of the loop:

 

Posted in Science | Leave a comment

Wisdom is Real and Meaningful

Back in College, a humanities professor I had, postulated that Wisdom is meaningless, and there is an assumption that just because someone is older they are wise and someone who is younger and educated via the modern western education system is “smarter” than an older person who is wise. At the time I agree with him, however given my own life experience, I believe he and I were both wrong in this assumption and instead while not all older people are wise, true wisdom can exist.

We see similar terms all the time, such as Street Smarts or Common Sense. These are real. And this type of knowledge is sometimes difficult to explain via writing in the traditional sense of how western education now takes place. In the past we valued apprenticeships, and I believe this was a type of education that imparted wisdom and knowledge onto the apprentice by the master. So Yes, Wisdom is real, and is extremely valuable. Wisdom and teaching styles such as Apprenticeship extremely useful tools in the passing and development of knowledge.

Happy Easter!

~Robert

Posted in Philosophy | Leave a comment

Master Map-Reduce Job – The One and Only ETL Map-Reduce Job you will ever have to write!

It’s fitting that my first article on Big Data would be titled the “Master Map-Reduce Job”. I believe it truly is the one and only Map-Reduce job you will every have to write, at least for ETL (Extract, Transform and Load) Processes. I have been working with Big Data and specifically with Hadoop for about two years now and I achieved my Cloudera Certified Developer for Apache Hadoop (CCDH) almost a year ago at the writing of this post.

So what is the Master Map-Reduce Job? Well it is a concept I started to architect that would become a framework level Map-Reduce job implementation that by itself is not a complete job, but uses Dependency Injection AKA a Plugin like framework to configure a Map-Reduce Job specifically for ETL Load processes.

Like most frameworks, you can write your process without them, however what the Master Map-Reduce Job (MMRJ) does is break down certain critical sections of the standard Map-Reduce job program into plugins that are named more specific to ETL processing, so it makes the jump from non-Hadoop based ETL to Hadoop based ETL easier for non-Hadoop-initiated developers.

I think this job is also extremely useful for the Map-Reduce pro who is implementing ETL jobs, or groups of ETL developers that want to create consistent Map-Reduce based loaders, and that’s the real point of the MMRJ. To create a framework for developers to use that will enable them to create robust, consistent, and easily maintainable Map-Reduce based loaders. It follows my SFEMS – Stable, Flexible, Extensible, Maintainable, Scalable development philosophy.

The point of the Master Map Reduce concept framework is to breaks down the Driver, Mapper, and Reducer into parts that non-Hadoop/Map-Reduce programmers are well familiar with; especially in the ETL world. It is easy for Java developers who build Loaders for a living to understand vocabulary like Validator, Transformer, Parser, OutputFormatter, etc. They can focus on writing business specific logic and they do not have to worry about the finer points of Map-Reduce.

As a manager you can now hire a single senior Hadoop/Map-Reduce developer and hire normal core Java developers for the rest of your team or better yet reuse your existing team and you can have the one senior Hadoop developer maintain your version of the Master Map-Reduce Job framework code, and the rest of your developers focus on developing feed level loader processes using the framework. In the end all developers can learn Map-Reduce, but you do not need to know Map-Reduce to get started writing loaders that will work on the Hadoop cluster by using this framework.

The design is simple and can be show by this one diagram:

Master_Map-Reduce_Job_Diagram

One of the core concepts that separates the Master Map-Reduce Job Conceptual Framework from a normal Map-Reduce Job, is how the Mapper and Reducer are structured and the logic that normally would be written directly in the map and reduce functions are now externalized into classes that use vocabulary that is natively familiar to ETL Java Developers, such as Validator, Parser, Transformer, Output Formatter. It is this externalization that simplifies the ETL job Map-Reduce development. I believe that what confuses developers about how to make Map-Reduce jobs work as robust ETL processes is that it’s too low level. You take a look at a map function and a reduce function, and a developer who does not have experience with writing complex map-reduce jobs, will take one look and say it’s too low level and perhaps even I’m not sure exactly what they expect me to do with this. Developers can be quickly turned off by the raw low level interface, although tremendously power that Map-Reduce exposes.

It is this code below that makes the most valuable architectural asset of the framework. The fact that we in the Master Map-Reduce Job Conceptual Framework have broken down the map method of the Mapper class into a very simple process flow of FIVE steps that will make sense to any ETL Developer. Please read through the comments, for each step. Also note that the same thing is done for the Reducer, but only the Transform and Output Formatter are used.

Map Function turn into a ETL Process Goldmine:


@Override

  public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException {

    String record;

    String[] fields;

    try {

      //First validate the record

      record = value.toString();

      if (validator.validateRecord(record)) {

        //Second Parse valid records into fields

        fields = (String[]) parser.parse(record);

        //Third validate individual tokens or fields

        if (validator.validateFields(fields)) {

          //Fourth run transformation logic

          fields = (String[]) transformer.runMapSideTransform(fields);

          //Fifth output transformed records

          outputFormatter.writeMapSideFormat(key, fields, output);

        }

        else {

          //One or more fields are invalid!

          //For now just record that

          reporter.getCounter(MasterMapReduceCounters.VALIDATION_FAILED_RECORD_CNT).increment(1);

        }

      } //End if validator.validateRecord 

      else {

        //Record is invalid!

        //For now just record, but perhaps more logic

        //to stop the loader if a threshold is reached

        reporter.getCounter(MasterMapReduceCounters.VALIDATION_FAILED_RECORD_CNT).increment(1);

      }

    } //End try block

    catch (MasterMapReduceException e) {

      throw new IOException(e);

    }

  }

Source Code for the Master Map-Reduce Concept Framework:

The source code here should be considered a work in progress. I make no statements to if this actually works, nor has it been stress tested in anyway, and should only be used as a reference. Do not use it directly in mission critical or production applications.

All Code on this page is released under the following open source license:

Copyright 2016 Robert C. Ilardi
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

MasterMapReduceDriver.java – This class is a generic Map-Reduce Driver program, which makes use of two classes from the MasterMapReduce concept framework, which are the “MasterMapReduceConfigDao” and “PluginController”. Both are responsible for returning configuration data to the MasterMapReduceDriver, as well as (we will see later on) the Master Mapper and Master Reducer. The MasterMapReduceConfigDao, is a standard Data Access Object implementation that wraps data access to HBase, where configuration tables are created that make use of a “Feed Name” as the row keys, and have various columns that represent class names, or other configuration information such as Job Name, Reducer Task number, etc. The PluginController is a higher level wrapper around the DAO itself, whereas the DAO is responsible for low level data access to HBase, the PluginController, does the class creation and other high level functions that make use of the data returned by the DAO. We do not present the implementations for the DAO or the PluginController here because they are simple PoJos that you should implement based on your configuration strategy. Instead of HBase for example, it can be done via a set of plain text files on HDFS or even the local file system.

The Master Map Reduce Driver is responsible for setting up the Map-Reduce Job just like any other standard Map-Reduce Driver. The main difference is that it has been written to make use the Plugin architecture to configure the job’s parameters dynamically.

/**

 * Created Feb 1, 2016

 */


package com.roguelogic.mrloader;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.conf.Configured;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.mapred.FileInputFormat;

import org.apache.hadoop.mapred.FileOutputFormat;

import org.apache.hadoop.mapred.JobClient;

import org.apache.hadoop.mapred.JobConf;

import org.apache.hadoop.mapred.RunningJob;

import org.apache.hadoop.util.Tool;

import org.apache.hadoop.util.ToolRunner;

/**

 * @author Robert C. Ilardi

 *

 */

public class MasterMapReduceDriver extends Configured implements Tool {

  public static final String MMR_FEED_NAME = "RL.MasterMapReduce.FeedName";

  private MasterMapReduceConfigDao confDao;

  private PluginController pluginController;

  private String feedName;

  private String mmrJobName;

  private String inputPath;

  private String outputPath;

  public MasterMapReduceDriver() {

    super();

  }

  public synchronized void init(String feedName) {

    System.out.println("Initializing MasterMapReduce Driver for Feed Name: " + feedName);

    this.feedName = feedName;

    //Create MMR Configuration DAO (Data Access Object)

    confDao = new MasterMapReduceConfigDao();

    confDao.init(feedName); //Initialize Config DAO for specific Feed Name

    //Read Driver Level Properties

    mmrJobName = confDao.getLoaderJobNameByFeedName();

    inputPath = confDao.getLoaderJobInputPath();

    outputPath = confDao.getLoaderJobOutputPath();

    //Configure MMR Plugin Controller

    pluginController = new PluginController();

    pluginController.setConfigurationDao(confDao);

    pluginController.init();

  }

  @Override

  public int run(String[] args) throws Exception {

    JobConf jConf;

    Configuration conf;

    int res;

    conf = getConf();

    jConf = new JobConf(conf, this.getClass());

    jConf.setJarByClass(this.getClass());

    //Set some shared parameters to send to Mapper and Reducer

    jConf.set(MMR_FEED_NAME, feedName);

    configureBaseMapReduceComponents(jConf);

    configureBaseMapReduceOutputFormat(jConf);

    configureBaseMapReduceInputFormat(jConf);

    res = startMapReduceJob(jConf);

    return res;

  }

  private void configureBaseMapReduceInputFormat(JobConf jConf) {

    Class clazz;

    clazz = pluginController.getInputFormat();

    jConf.setInputFormat(clazz);

    FileInputFormat.setInputPaths(jConf, new Path(inputPath));

  }

  private void configureBaseMapReduceOutputFormat(JobConf jConf) {

    Class clazz;

    clazz = pluginController.getOutputKey();

    jConf.setOutputKeyClass(clazz);

    clazz = pluginController.getOutputValue();

    jConf.setOutputValueClass(clazz);

    clazz = pluginController.getOutputFormat();

    jConf.setOutputFormat(clazz);

    FileOutputFormat.setOutputPath(jConf, new Path(outputPath));

  }

  private void configureBaseMapReduceComponents(JobConf jConf) {

    Class clazz;

    int cnt;

    //Set Mapper Class

    clazz = pluginController.getMapper();

    jConf.setMapperClass(clazz);

    //Optionally Set Custom Reducer Class

    clazz = pluginController.getReducer();

    if (clazz != null) {

      jConf.setReducerClass(clazz);

    }

    //Optionally explicitly set number of reducers if available

    if (pluginController.hasExplicitReducerCount()) {

      cnt = pluginController.getReducerCount();

      jConf.setNumReduceTasks(cnt);

    }

    //Set Partitioner Class if a custom one is required for this Job

    clazz = pluginController.getPartitioner();

    if (clazz != null) {

      jConf.setPartitionerClass(clazz);

    }

    //Set Combiner Class if a custom one is required for this Job

    clazz = pluginController.getCombiner();

    if (clazz != null) {

      jConf.setCombinerClass(clazz);

    }

  }

  private int startMapReduceJob(JobConf jConf) throws IOException {

    int res;

    RunningJob job;

    job = JobClient.runJob(jConf);

    res = 0;

    return res;

  }

  public static void main(String[] args) {

    int exitCd;

    MasterMapReduceDriver mmrDriver;

    Configuration conf;

    String feedName;

    if (args.length < 1) {

      exitCd = 1;

      System.err.println("Usage: java " + MasterMapReduceDriver.class + " [FEED_NAME]");

    }

    else {

      try {

        feedName = args[0];

        conf = new Configuration();

        mmrDriver = new MasterMapReduceDriver();

        mmrDriver.init(feedName);

        exitCd = ToolRunner.run(conf, mmrDriver, args);

      } //End try block

      catch (Exception e) {

        exitCd = 1;

        e.printStackTrace();

      }

    }

    System.exit(exitCd);

  }

}


Code Formatted by ToGoTutor

BaseMasterMapper.java – This class is an abstract base class that implements the configure method of the Mapper implementation, to make use of the DAO and PluginController already described above. It should be extended by all your Mapper implementations you use when creating a Map-Reduce job using the Master Map Reduce concept framework. In the future we might create additional helper functions in this class for the mappers to use. In the end you only need a finite number of Mapper implementations. It is envisioned that the number of mappers is related more to the number of file formats you have, not the number of feeds. The idea of the framework is not to have to write the lower level components of a Map-Reduce job at the feed level, and instead developers should focus on the business logic such as Validation logic and Transformation logic. The fact that this logic runs in a Map-Reduce job is simply because it needs to run on the Hadoop cluster. Otherwise these loader jobs execute logic like any other standard Loader job running outside of the Hadoop cluster.

/**

 * Created Feb 1, 2016

 */


package com.roguelogic.mrloader;

import org.apache.hadoop.mapred.JobConf;

import org.apache.hadoop.mapred.MapReduceBase;

/**

 * @author Robert C. Ilardi

 *

 */

public abstract class BaseMasterMapper extends MapReduceBase {

  protected String feedName;

  protected MasterMapReduceConfigDao confDao;

  protected PluginController pluginController;

  protected Validator validator; //Used to validate Records and Fields

  protected Parser parser; //Used to parse records into fields

  protected Transformer transformer; //Used to run transformation logic on fields

  protected OutputFormatter outputFormatter; //Used to write out formatted records

  public BaseMasterMapper() {

    super();

  }

  @Override

  public void configure(JobConf conf) {

    feedName = conf.get(MasterMapReduceDriver.MMR_FEED_NAME);

    confDao = new MasterMapReduceConfigDao();

    confDao.init(feedName);

    pluginController = new PluginController();

    pluginController.setConfigurationDao(confDao);

    pluginController.init();

    validator = pluginController.getValidator();

    parser = pluginController.getParser();

    transformer = pluginController.getTransformer();

    outputFormatter = pluginController.getOutputFormatter();

  }

}


Code Formatted by ToGoTutor

BaseMasterReducer.java – Just like on the Mapper side, this class is the base class for all Reducers implementations that are used with the Master Map-Reduce Job framework. Like the BaseMasterMapper class it implements the configure method and provides access to the DAO and PluginController for reducer implementations. Again in the future we may expand this to include additional helper functions.

/**

 * Created Feb 1, 2016

 */


package com.roguelogic.mrloader;

import org.apache.hadoop.mapred.JobConf;

import org.apache.hadoop.mapred.MapReduceBase;

/**

 * @author Robert C. Ilardi

 *

 */

public abstract class BaseMasterReducer extends MapReduceBase {

  protected String feedName;

  protected MasterMapReduceConfigDao confDao;

  protected PluginController pluginController;

  protected Transformer transformer; //Used to run transformation logic on fields

  protected OutputFormatter outputFormatter; //Used to write out formatted records

  public BaseMasterReducer() {

    super();

  }

  @Override

  public void configure(JobConf conf) {

    feedName = conf.get(MasterMapReduceDriver.MMR_FEED_NAME);

    confDao = new MasterMapReduceConfigDao();

    confDao.init(feedName);

    pluginController = new PluginController();

    pluginController.setConfigurationDao(confDao);

    pluginController.init();

    transformer = pluginController.getTransformer();

    outputFormatter = pluginController.getOutputFormatter();

  }

}


Code Formatted by ToGoTutor

StringRecordMasterMapper.java – This is a example implementation of what a Master Mapper implementation would look like. Note that it has nothing to do with the Feed, instead it is related to the file format. Specifically this class would make sense as a mapper for a delimited text file format.


/**

 * Created Feb 1, 2016

 */


package com.roguelogic.mrloader;

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.Mapper;

import org.apache.hadoop.mapred.OutputCollector;

import org.apache.hadoop.mapred.Reporter;

/**

 * @author Robert C. Ilardi

 *

 */

public class StringRecordMasterMapper extends BaseMasterMapper implements Mapper {

  public StringRecordMasterMapper() {

    super();

  }

  @Override

  public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException {

    String record;

    String[] fields;

    try {

      //First validate the record

      record = value.toString();

      if (validator.validateRecord(record)) {

        //Second Parse valid records into fields

        fields = (String[]) parser.parse(record);

        //Third validate individual tokens or fields

        if (validator.validateFields(fields)) {

          //Fourth run transformation logic

          fields = (String[]) transformer.runMapSideTransform(fields);

          //Fifth output transformed records

          outputFormatter.writeMapSideFormat(key, fields, output);

        }

        else {

          //One or more fields are invalid!

          //For now just record that

          reporter.getCounter(MasterMapReduceCounters.VALIDATION_FAILED_RECORD_CNT).increment(1);

        }

      } //End if validator.validateRecord 

      else {

        //Record is invalid!

        //For now just record, but perhaps more logic

        //to stop the loader if a threshold is reached

        reporter.getCounter(MasterMapReduceCounters.VALIDATION_FAILED_RECORD_CNT).increment(1);

      }

    } //End try block

    catch (MasterMapReduceException e) {

      throw new IOException(e);

    }

  }

}


Code Formatted by ToGoTutor

StringRecordMasterReducer.java – This is an example implementation of what the Master Reducer would look like. It compliments the StringRecordMasterMapper from above, in that it works well with text line / delimited file formats. The idea here is that the Mapper parses and transforms raw feed data into a conical data model and outputs that transformed data in a similar delimited text file format. Most likely the Reducer implementation can simply be a pass through. It’s possible that a reducer in this case is not even needed, and we can configure the Master Map Reduce Driver to be a Map-Only job.


/**

 * Created Feb 1, 2016

 */


package com.roguelogic.mrloader;

import java.io.IOException;

import java.util.Iterator;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.NullWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.OutputCollector;

import org.apache.hadoop.mapred.Reducer;

import org.apache.hadoop.mapred.Reporter;

/**

 * @author Robert C. Ilardi

 *

 */

public class StringRecordMasterReducer extends BaseMasterReducer implements Reducer {

  public StringRecordMasterReducer() {

    super();

  }

  @Override

  public void reduce(LongWritable key, Iterator values, OutputCollector output, Reporter reporter) throws IOException {

    String data;

    Text txt;

    try {

      while (values.hasNext()) {

        txt = values.next();

        data = txt.toString();

        //First run transformation logic

        data = (String) transformer.runReduceSideTransform(data);

        //Second output transformed records

        outputFormatter.writeReduceSideFormat(data, output);

      } //End while (values.hasNext()) 

    } //End try block

    catch (MasterMapReduceException e) {

      throw new IOException(e);

    }

  }

}


Code Formatted by ToGoTutor

Conclusion

In the end, some make ask how much value those a framework like this add? Isn’t Map-Reduce simple enough? Well the truth is, we need to ask this for all frameworks and wrappers we use. Are their inclusion worth it? I think in this case the Master Map Reduce framework does add value. It breaks down the Driver, Mapper, and Reducer into parts that non-Hadoop/Map-Reduce programmers are well familiar with; especially in the ETL world. It is easy for Java developers who build Loaders for a living to understand vocabulary like Validator, Transformer, Parser, OutputFormatter, etc. They can focus on writing business specific logic and they do not have to worry about the finer points of Map-Reduce. Combine this with the fact that this framework creates an environment where you can create hundreds of Map-Reduce programs, one for each feed you are loading, and each program will have the exact same Map-Reduce structure, I believe this framework is well worth it.

Just Another Stream of Random Bits…
– Robert C. Ilardi
Posted in Big Data, Development | Leave a comment

Synthetic Transactions and Capability Monitoring of your Enterprise Architecture

Back in my days at Lehman Brothers, I was introduced to the concept of “Synthetic Transactions”. That is an automated action that is scheduled to execute periodically to monitor performance and availability of one of more components in your enterprise architecture.

Most architects will use SNMP, and simple pinging of servers, routers, networks, etc, and monitoring things like Disk Space, CPU Usage and Memory Usage. Pretty much anything that can be recorded via HP OpenView / HP BTO (Business Technology Optimization) I believe this is ok for infrastructure monitoring, but for application monitor, which I believe gives you a better view into the health of your Enterprise Architecture, that matters to the real users and clients, Synthetic Transactions are far more superior.

Synthetic Transactions go further than simple network or infrastructure monitoring and it goes further than even simple application performance metrics monitoring with say a tool like ITRS’s Geneos. A Synthetic Transaction is really about testing the capabilities of your systems and applications from the view point of a end user or a calling client system, to ensure that the system is available with the capabilities and performance profile agree upon by the contract set in your requirements.

Synthetic Transactions are not always easy to implement, and great care must be put into planning the inclusion of Synthetic Transactions from the beginning of system design and architecture analysis and should be part of Non-Functional Requirements.

Also in terms of Information Security, and Intrusion Detection, Synthetic Transactions are a way to start implementing the next phase of network defenses. As you all know in today’s world, firewalls are no longer sufficient to keep the hackers out of your systems. More and more hackers have already turned to attacking specific application weaknesses instead of going after the raw network infrastructure as the infrastructure was the first and easiest way for organizations to shore up their security.

While Synthetic Transactions won’t prevent cyber attacks, or increase security by themselves, the detailed level component monitoring and performance metrics collection that Synthetic Transactions provide can potentially help identify applications or components of applications that are under attack or have been compromised due to potential performance or application behavioral issues caused by hackers attacking your applications.

Microsoft has a good outline of what a Synthetic Transaction is, although they related it to their Operations Manager product, the general information is valid regardless if you use a tool or develop your own Synthetic Transaction Agents. Specifically Microsoft states in this article: “Synthetic transactions are actions, run in real time, that are performed on monitored objects. You can use synthetic transactions to measure the performance of a monitored object and to see how Operations Manager reacts when synthetic stress is placed on your monitoring settings. For example, for a Web site, you can create a synthetic transaction that performs the actions of a customer connecting to the site and browsing through its pages. For databases, you can create transactions that connect to the database. You can then schedule these actions to occur at regular intervals to see how the database or Web site reacts and to see whether your monitoring settings, such as alerts and notifications, also react as expected.”

Another good definition however more of just a summary than what Microsoft outlined, is available on Wikipedia in the Operational Intelligence article, specifically the section on System Monitoring where they state: “Capability monitoring usually refers to synthetic transactions where user activity is mimicked by a special software program, and the responses received are checked for correctness.”

Although, Wikipedia does not have a lot of direct information about Synthetic Transactions, I do like their term “Capability Monitoring”, which is exactly what Synthetic Transactions attempts to do, monitor the capabilities of your system at any given moment, to give you, your developers and your operations support staff a dashboard level view into how your system is performing and what components are available and their through the performance measures, what is the health of each of your system’s components and therefore the overall health of your system and applications.

Back at Lehman, and if you look at the Microsoft description, most times a Synthetic Transaction focuses on a single aspect of the System; for example, checking if you are able to open a connection to a database. While this is a valid Synthetic Transaction, it is extremely simple, and may not provide you with enough information to tell if you application is actually available from an end user or client system standpoint.

What I developed as a model for Synthetic Transactions back in 2006, was they ability for my Transaction to interact with multiple-tiers of my architecture, if not all tiers.

The application which I was developing Synthetic Transactions for was a Reference Data system that included a Desktop and Web base Front Ends, a JavaEE (J2EE at the time) based Middleware, a Relational Database, a Workflow Engine, and a Message Publisher, among other various supporting components such as ETL processes, and other batch processing.

The most useful test in this case would be one that touched the Middleware, interacted with the workflow engine, retrieved data from the database and potentially updated test records, and had those test messages published and received by the Synthetic Transaction Agent to verify the full flow of the system.

Creating the Agent:

To create the Agent that would initiate the Transactions, I used a Job schedule such as Autosys or Control-M to schedule the process to kick off every couple of hours to collect metrics (Since the application was a global app used 24 x 7, it was important that the application was not only available but was performant around the clock, and we needed to be alerted if the application was performing out of an acceptable range, and which component was affected).

The Agent itself was a client of the middleware. Since all services such as the Database and the Workflow Engine were wrapped by the middleware, we could have the agent invoke different APIs that would perform a Database Search and record metrics, and call an API that would create a Workflow request, and move it automatically through the workflow steps.

At the end of the workflow, we were able to trigger the messaging publisher to broadcast a message. Since our Data Model allowed for Test records, and we built into our requirements that consumers generally filter out or otherwise ignore Test records in the message flow, we were able to send out test messages in the production environment that would not affect any of our downstream clients.

However, our Agent process could start up a message listener and listen for test records specifically. The Agent then by recording the start time of the workflow transaction to the receive time of the test record message, could calculate the round trip time of data flowing through the system.

Each individual API call from invocation to return can also be timed to test how each different API was performing.

In terms of ETL, since the Data Model again allowed for test records, we were able to create a small file of test records and trigger the ETL process as well to load the test records. The records in the database would be updated, in some cases with just a timestamp update, but it would still be a valid test, and valid metrics can still be collected.

Together this gave us good dashboard view of the system’s availability and performance at a given time. If we wanted to increase the resolution all we had to do was decrease the period between each job start of the Agents.

We recorded the metrics in a database table, and created a simple web page, which production support teams could use to monitor the Synthetic Transactions and their reported metrics.

On a side note: If your APIs and libraries are written in Java, and already record metrics that your developers used for debugging, and Unit Testing, you can expose these directly via JMX, which can be accessed and used directly if your Synthetic Transaction Agent process(es) are also written in Java. Or you can create a separate function or API that returns the internal metrics recorded by your libraries, frameworks and API deployments.

A number of years ago, I developed a Performance Metrics object model and small set of helper functions for Java that I have been using for over a decade and I find that even today they are still the most useful performance metrics I can collect. Perhaps I will write up an article on collecting performance metrics in the applications you develop and share that simple object model and helper functions.

Automated alerts, such as paging the on call support staff could also be accomplished by simply specifying how many seconds or milliseconds a call to an API should take, and if that period is exceeded, the Agent would send out emails and paging alerts.

In the end a lot of organizations have a Global Technology and Architecture Principal that mandates all their applications have some sort of automated system testing.

This can be accomplished by using the Synthetic Transaction paradigm.

It is worth noting that creating an architecture that supports Synthetic Transactions is not simply. You need to ensure that all components, especially your data and information models allow for test records.

A way around the information model requirement is to Rollback all transactions on your database instead of committing them. This would force you to have a flag or special API separate from the normal data flow in your system to ensure data is not permanently written to your database. However, the issue here, is if you implement it this way, you cannot have a true end to end flow in production of test records. Still you will be able to get most of the metrics you need.

Also if you organization only mandates a certain level of automated testing or performance and availability monitoring, than perhaps true end to end data flow through your system is not required.

It is my experience however, that even if my company I work for does not mandate true end to end testing, as a responsible application owner, I prefer to have the capabilities to have true end to end data flow testing available to me, so I can monitor my systems more accurately and give proper answers to stakeholders when users and client systems complain about performance or system availability.

Just Another Stream of Random Bits…
– Robert C. Ilardi
 
 
Posted in Architecture | Leave a comment