So some of you may be asking yourself why I have read Marx and why am I even willing to go see his statue while I’m in town in Berlin, the answer is simple: “Know thy self, know thy enemy. A thousand battles, a thousand victories.” – Sun Tzu; The Art of War. And yes Socialism in all its forms is my enemy. Why because socialism is the enemy of individual freedom by its very definition. And an enemy of freedom is also my enemy…
Today, Alex Jones’ InfoWars was removed from the social media platforms: Facebook, Spotify, YouTube, and Apple iTunes PodCasts. This is a major attack on the freedom of expression on today’s Internet.
You don’t need to like Alex Jone’s or agree with him, but he does have the right to his opinions and he should have the protected right to say what he wishes on the Internet.
It should be up to the consumer to “change the channel” if they don’t like what his brand of content says. We cannot give up our right to share ideas and read other people’s ideas to a handful of big tech firms. The Internet has become the new Public Forum, and therefore our speech and writing on the Internet needs to be protected by the First Amendment.
When I started creating content on the Internet back in 1994, I was in high school and was creating HTML pages by hand on a web hosting platform was known as GeoCities.
On Today’s Internet, users do not need to know how to code even a simple markup language like HTML, and instead can use “Social Media” tools like Facebook and Twitter, or post Videos to YouTube and other video hosting sites.
It has become easier than ever to use the Internet to share our ideas and for the most of us, at least in the Western world, the majority of our daily communications is now done on the Internet, and usually it is made via a couple of dozen web sites at most.
Internet Censorship is on the rise, and we need to put a stop to it once an for all.
I am a capitalist through and through, but more than that I’m an American and somewhat of a Constitutionalist, especially when it comes to our rights, like Freedom of Speech and Freedom of the Press.
I know the First Amendment is meant to protect speech in the public square, however the new public square are these Social Media sites.
So I’m asking everyone who believes in their own right to their ability to share your thoughts, your passions, and your opinions, to start making calls and writing to your Senators and your Representatives, at the Federal level, but also at the State less as well. Please ask them to work towards a bill and hopefully an Amendment that basically says if a Technology Company, Web Hosting Service, Domain Registrar, Social Media Platform, and Media Sharing Platform like YouTube and Apple’s iTunes, as well as Search Engines like Google and Bing, that if they want to continue to operate within the United States, they MUST respect the first amendment.
We aren’t talking about making private companies government owned, but just like any other telecommunications company like your Land Line Phone Company, Cellular Phone Company, your Cable Company, and Broadcast Radio and TV, among others, they need to be regulated to prevent them from removing anyone’s content.
Let it fall to the realm of the US Courts to determine if someone’s account violates an actual Law. This way everyone’s Due Process Rights are protected, and everyone’s rights to a fair and free Public Forum on the Internet are protected.
The Democrat Party in the United States around the year 2045 was in shambles; a mere shadow of its former self. After approximately 4 major US Senate Election Cycles and 7 US Presidential Election Cycles the Democratic Party only continued to fracture between what in 2018 was known as the Neo-Liberals and the “Alt-Left” (also known as the Far Left in certain circles). The DSA (Democratic Socialists of America) continued to gain traction among the Alt-Left but have not been able to gain more than a handful of seats in both houses of congress and never had a truly viable contender for the Presidency. However the Alt-Left on both coasts of the United States and a handful of other states where liberals, in overwhelming numbers have migrated to, brining their socialist ideas along with them, have becoming ever more disgruntled with their battle to try to wage a political war using the System. There have been cries by the Alt-Left since before President Donald J. Trump’s first election win in November 2016 that the “System” is too broken to fix from within, and protesters and the far-left have said the “Only Solution is Revolution”, and that call has grown only stronger in the past 29 years since that monumental election of 2016. While the policies President Trump, the Republicans, and those few Democrats that decided to break with party lines, reaching across the aisle joining the new Renaissance in America, creating a very powerful Economy for the American People. The picture was not as rosy for the rest of the world, and where ever Socialism gained power, those nations fell into Economic and Social decay. Still with all this proof, both at home and abroad, the Alt-Left ignored the prosperity they saw around them, especially in the more classically liberal and therefore libertarian States within the Union. The arrogance that the “Alt-Left” and all other socialists (coastal elites) “knew” what’s better for our country, and somehow the “corrupt” system of Capitalism, at least in their view was somehow keeping itself afloat by the top 5% of incoming earners pushing down on the “bottom” 95%. What started with a fight against the 1% by Occupy in 2011, slowly turned into the “other 98%” then into the fight against the 5%. Some of us knew this was to be inevitable, and the socialists try to push for a fight against the top 10% but face resistance at this level, but were able to sustain somewhat of a political battle against the top 5% in the United States, but starting and continuing to wage a Class War in the Alt-Left Strongholds like California and New York. While the country seemed united under economic prosperity, dissent, disdain, and even hatred continued to grow in these Socialist States, brought by a sense that Socialism will never take hold within the United States at a Federal Level. By 2045 this hatred from the Socialist States on the coasts for the Capitalist Libertarian States in the rest of the country reach a fever pitch. Movements like Calexit were coming up in political debates and major protests, sometimes violent, across all the States where Socialism took hold of their major cities; as it was only in the cities where lack of true freedom of the individual and instead Groupthink prevailed…
Red Tide is a Fictional Universe, I’m creating based on this premise: The year the main story starts in is 2045. Since the first election of Donald J. Trump, America as enjoyed a very strong economy with lower taxes and deregulation. For the more libertarian or “classical liberal” States within the Union, prosperity was clearly visible, while States especially on the two coasts that embraced a more Socialist attitude, economy and social decay advanced. Within the 29 years between 2016 and 2045, the Alt-Left has continued to become disenfranchised with the American System of Government, because although with all their work trying to gain political power by using our free elections, they only managed to gain a handful of seats in either of the Congressional Houses and they never put forth a viable candidate for the Presidency. So the Alt-Left turned more inward continuing to espouse their slogan they have used since 2016 that “The only Solution is Revolution”.
The Red Tide fictional universe is about how the Alt-Left destroyed the Democratic Party in the United States and follows their attempt to tear apart our great nation from within; attempting to replace our legitimate elected Government with a Communist Regime using subversive social tactics to rile up their small base of Socialist to lead a legion of made up of students and illegal migrants who have been brainwashed by the ever growing and dangerous group of Marxist Radicals within the Public K-12 Education System, the University System, and the Far-Left Leaning State Governments (Coastal Elites) within the United States.
This story is supposed to be controversial. It’s supposed to make you think. And my hope in the end is that it helps to establish a dialog on why Socialism in all it’s forms is inherently against our basic human rights to Life, Liberty, and the Pursuit of Happiness.
Back in College, a humanities professor I had, postulated that Wisdom is meaningless, and there is an assumption that just because someone is older they are wise and someone who is younger and educated via the modern western education system is “smarter” than an older person who is wise. At the time I agree with him, however given my own life experience, I believe he and I were both wrong in this assumption and instead while not all older people are wise, true wisdom can exist.
We see similar terms all the time, such as Street Smarts or Common Sense. These are real. And this type of knowledge is sometimes difficult to explain via writing in the traditional sense of how western education now takes place. In the past we valued apprenticeships, and I believe this was a type of education that imparted wisdom and knowledge onto the apprentice by the master. So Yes, Wisdom is real, and is extremely valuable. Wisdom and teaching styles such as Apprenticeship extremely useful tools in the passing and development of knowledge.
It’s fitting that my first article on Big Data would be titled the “Master Map-Reduce Job”. I believe it truly is the one and only Map-Reduce job you will every have to write, at least for ETL (Extract, Transform and Load) Processes. I have been working with Big Data and specifically with Hadoop for about two years now and I achieved my Cloudera Certified Developer for Apache Hadoop (CCDH) almost a year ago at the writing of this post.
So what is the Master Map-Reduce Job? Well it is a concept I started to architect that would become a framework level Map-Reduce job implementation that by itself is not a complete job, but uses Dependency Injection AKA a Plugin like framework to configure a Map-Reduce Job specifically for ETL Load processes.
Like most frameworks, you can write your process without them, however what the Master Map-Reduce Job (MMRJ) does is break down certain critical sections of the standard Map-Reduce job program into plugins that are named more specific to ETL processing, so it makes the jump from non-Hadoop based ETL to Hadoop based ETL easier for non-Hadoop-initiated developers.
I think this job is also extremely useful for the Map-Reduce pro who is implementing ETL jobs, or groups of ETL developers that want to create consistent Map-Reduce based loaders, and that’s the real point of the MMRJ. To create a framework for developers to use that will enable them to create robust, consistent, and easily maintainable Map-Reduce based loaders. It follows my SFEMS – Stable, Flexible, Extensible, Maintainable, Scalable development philosophy.
The point of the Master Map Reduce concept framework is to breaks down the Driver, Mapper, and Reducer into parts that non-Hadoop/Map-Reduce programmers are well familiar with; especially in the ETL world. It is easy for Java developers who build Loaders for a living to understand vocabulary like Validator, Transformer, Parser, OutputFormatter, etc. They can focus on writing business specific logic and they do not have to worry about the finer points of Map-Reduce.
As a manager you can now hire a single senior Hadoop/Map-Reduce developer and hire normal core Java developers for the rest of your team or better yet reuse your existing team and you can have the one senior Hadoop developer maintain your version of the Master Map-Reduce Job framework code, and the rest of your developers focus on developing feed level loader processes using the framework. In the end all developers can learn Map-Reduce, but you do not need to know Map-Reduce to get started writing loaders that will work on the Hadoop cluster by using this framework.
The design is simple and can be show by this one diagram:
One of the core concepts that separates the Master Map-Reduce Job Conceptual Framework from a normal Map-Reduce Job, is how the Mapper and Reducer are structured and the logic that normally would be written directly in the map and reduce functions are now externalized into classes that use vocabulary that is natively familiar to ETL Java Developers, such as Validator, Parser, Transformer, Output Formatter. It is this externalization that simplifies the ETL job Map-Reduce development. I believe that what confuses developers about how to make Map-Reduce jobs work as robust ETL processes is that it’s too low level. You take a look at a map function and a reduce function, and a developer who does not have experience with writing complex map-reduce jobs, will take one look and say it’s too low level and perhaps even I’m not sure exactly what they expect me to do with this. Developers can be quickly turned off by the raw low level interface, although tremendously power that Map-Reduce exposes.
It is this code below that makes the most valuable architectural asset of the framework. The fact that we in the Master Map-Reduce Job Conceptual Framework have broken down the map method of the Mapper class into a very simple process flow of FIVE steps that will make sense to any ETL Developer. Please read through the comments, for each step. Also note that the same thing is done for the Reducer, but only the Transform and Output Formatter are used.
Source Code for the Master Map-Reduce Concept Framework:
The source code here should be considered a work in progress. I make no statements to if this actually works, nor has it been stress tested in anyway, and should only be used as a reference. Do not use it directly in mission critical or production applications.
All Code on this page is released under the following open source license:
Copyright 2016 Robert C. Ilardi
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
MasterMapReduceDriver.java – This class is a generic Map-Reduce Driver program, which makes use of two classes from the MasterMapReduce concept framework, which are the “MasterMapReduceConfigDao” and “PluginController”. Both are responsible for returning configuration data to the MasterMapReduceDriver, as well as (we will see later on) the Master Mapper and Master Reducer. The MasterMapReduceConfigDao, is a standard Data Access Object implementation that wraps data access to HBase, where configuration tables are created that make use of a “Feed Name” as the row keys, and have various columns that represent class names, or other configuration information such as Job Name, Reducer Task number, etc. The PluginController is a higher level wrapper around the DAO itself, whereas the DAO is responsible for low level data access to HBase, the PluginController, does the class creation and other high level functions that make use of the data returned by the DAO. We do not present the implementations for the DAO or the PluginController here because they are simple PoJos that you should implement based on your configuration strategy. Instead of HBase for example, it can be done via a set of plain text files on HDFS or even the local file system.
The Master Map Reduce Driver is responsible for setting up the Map-Reduce Job just like any other standard Map-Reduce Driver. The main difference is that it has been written to make use the Plugin architecture to configure the job’s parameters dynamically.
BaseMasterMapper.java – This class is an abstract base class that implements the configure method of the Mapper implementation, to make use of the DAO and PluginController already described above. It should be extended by all your Mapper implementations you use when creating a Map-Reduce job using the Master Map Reduce concept framework. In the future we might create additional helper functions in this class for the mappers to use. In the end you only need a finite number of Mapper implementations. It is envisioned that the number of mappers is related more to the number of file formats you have, not the number of feeds. The idea of the framework is not to have to write the lower level components of a Map-Reduce job at the feed level, and instead developers should focus on the business logic such as Validation logic and Transformation logic. The fact that this logic runs in a Map-Reduce job is simply because it needs to run on the Hadoop cluster. Otherwise these loader jobs execute logic like any other standard Loader job running outside of the Hadoop cluster.
BaseMasterReducer.java – Just like on the Mapper side, this class is the base class for all Reducers implementations that are used with the Master Map-Reduce Job framework. Like the BaseMasterMapper class it implements the configure method and provides access to the DAO and PluginController for reducer implementations. Again in the future we may expand this to include additional helper functions.
StringRecordMasterMapper.java – This is a example implementation of what a Master Mapper implementation would look like. Note that it has nothing to do with the Feed, instead it is related to the file format. Specifically this class would make sense as a mapper for a delimited text file format.
StringRecordMasterReducer.java – This is an example implementation of what the Master Reducer would look like. It compliments the StringRecordMasterMapper from above, in that it works well with text line / delimited file formats. The idea here is that the Mapper parses and transforms raw feed data into a conical data model and outputs that transformed data in a similar delimited text file format. Most likely the Reducer implementation can simply be a pass through. It’s possible that a reducer in this case is not even needed, and we can configure the Master Map Reduce Driver to be a Map-Only job.
In the end, some make ask how much value those a framework like this add? Isn’t Map-Reduce simple enough? Well the truth is, we need to ask this for all frameworks and wrappers we use. Are their inclusion worth it? I think in this case the Master Map Reduce framework does add value. It breaks down the Driver, Mapper, and Reducer into parts that non-Hadoop/Map-Reduce programmers are well familiar with; especially in the ETL world. It is easy for Java developers who build Loaders for a living to understand vocabulary like Validator, Transformer, Parser, OutputFormatter, etc. They can focus on writing business specific logic and they do not have to worry about the finer points of Map-Reduce. Combine this with the fact that this framework creates an environment where you can create hundreds of Map-Reduce programs, one for each feed you are loading, and each program will have the exact same Map-Reduce structure, I believe this framework is well worth it.
Just Another Stream of Random Bits…
– Robert C. Ilardi
Back in my days at Lehman Brothers, I was introduced to the concept of “Synthetic Transactions”. That is an automated action that is scheduled to execute periodically to monitor performance and availability of one of more components in your enterprise architecture.
Most architects will use SNMP, and simple pinging of servers, routers, networks, etc, and monitoring things like Disk Space, CPU Usage and Memory Usage. Pretty much anything that can be recorded via HP OpenView / HP BTO (Business Technology Optimization) I believe this is ok for infrastructure monitoring, but for application monitor, which I believe gives you a better view into the health of your Enterprise Architecture, that matters to the real users and clients, Synthetic Transactions are far more superior.
Synthetic Transactions go further than simple network or infrastructure monitoring and it goes further than even simple application performance metrics monitoring with say a tool like ITRS’s Geneos. A Synthetic Transaction is really about testing the capabilities of your systems and applications from the view point of a end user or a calling client system, to ensure that the system is available with the capabilities and performance profile agree upon by the contract set in your requirements.
Synthetic Transactions are not always easy to implement, and great care must be put into planning the inclusion of Synthetic Transactions from the beginning of system design and architecture analysis and should be part of Non-Functional Requirements.
Also in terms of Information Security, and Intrusion Detection, Synthetic Transactions are a way to start implementing the next phase of network defenses. As you all know in today’s world, firewalls are no longer sufficient to keep the hackers out of your systems. More and more hackers have already turned to attacking specific application weaknesses instead of going after the raw network infrastructure as the infrastructure was the first and easiest way for organizations to shore up their security.
While Synthetic Transactions won’t prevent cyber attacks, or increase security by themselves, the detailed level component monitoring and performance metrics collection that Synthetic Transactions provide can potentially help identify applications or components of applications that are under attack or have been compromised due to potential performance or application behavioral issues caused by hackers attacking your applications.
Microsoft has a good outline of what a Synthetic Transaction is, although they related it to their Operations Manager product, the general information is valid regardless if you use a tool or develop your own Synthetic Transaction Agents. Specifically Microsoft states in this article: “Synthetic transactions are actions, run in real time, that are performed on monitored objects. You can use synthetic transactions to measure the performance of a monitored object and to see how Operations Manager reacts when synthetic stress is placed on your monitoring settings. For example, for a Web site, you can create a synthetic transaction that performs the actions of a customer connecting to the site and browsing through its pages. For databases, you can create transactions that connect to the database. You can then schedule these actions to occur at regular intervals to see how the database or Web site reacts and to see whether your monitoring settings, such as alerts and notifications, also react as expected.”
Another good definition however more of just a summary than what Microsoft outlined, is available on Wikipedia in the Operational Intelligence article, specifically the section on System Monitoring where they state: “Capability monitoring usually refers to synthetic transactions where user activity is mimicked by a special software program, and the responses received are checked for correctness.”
Although, Wikipedia does not have a lot of direct information about Synthetic Transactions, I do like their term “Capability Monitoring”, which is exactly what Synthetic Transactions attempts to do, monitor the capabilities of your system at any given moment, to give you, your developers and your operations support staff a dashboard level view into how your system is performing and what components are available and their through the performance measures, what is the health of each of your system’s components and therefore the overall health of your system and applications.
Back at Lehman, and if you look at the Microsoft description, most times a Synthetic Transaction focuses on a single aspect of the System; for example, checking if you are able to open a connection to a database. While this is a valid Synthetic Transaction, it is extremely simple, and may not provide you with enough information to tell if you application is actually available from an end user or client system standpoint.
What I developed as a model for Synthetic Transactions back in 2006, was they ability for my Transaction to interact with multiple-tiers of my architecture, if not all tiers.
The application which I was developing Synthetic Transactions for was a Reference Data system that included a Desktop and Web base Front Ends, a JavaEE (J2EE at the time) based Middleware, a Relational Database, a Workflow Engine, and a Message Publisher, among other various supporting components such as ETL processes, and other batch processing.
The most useful test in this case would be one that touched the Middleware, interacted with the workflow engine, retrieved data from the database and potentially updated test records, and had those test messages published and received by the Synthetic Transaction Agent to verify the full flow of the system.
Creating the Agent:
To create the Agent that would initiate the Transactions, I used a Job schedule such as Autosys or Control-M to schedule the process to kick off every couple of hours to collect metrics (Since the application was a global app used 24 x 7, it was important that the application was not only available but was performant around the clock, and we needed to be alerted if the application was performing out of an acceptable range, and which component was affected).
The Agent itself was a client of the middleware. Since all services such as the Database and the Workflow Engine were wrapped by the middleware, we could have the agent invoke different APIs that would perform a Database Search and record metrics, and call an API that would create a Workflow request, and move it automatically through the workflow steps.
At the end of the workflow, we were able to trigger the messaging publisher to broadcast a message. Since our Data Model allowed for Test records, and we built into our requirements that consumers generally filter out or otherwise ignore Test records in the message flow, we were able to send out test messages in the production environment that would not affect any of our downstream clients.
However, our Agent process could start up a message listener and listen for test records specifically. The Agent then by recording the start time of the workflow transaction to the receive time of the test record message, could calculate the round trip time of data flowing through the system.
Each individual API call from invocation to return can also be timed to test how each different API was performing.
In terms of ETL, since the Data Model again allowed for test records, we were able to create a small file of test records and trigger the ETL process as well to load the test records. The records in the database would be updated, in some cases with just a timestamp update, but it would still be a valid test, and valid metrics can still be collected.
Together this gave us good dashboard view of the system’s availability and performance at a given time. If we wanted to increase the resolution all we had to do was decrease the period between each job start of the Agents.
We recorded the metrics in a database table, and created a simple web page, which production support teams could use to monitor the Synthetic Transactions and their reported metrics.
On a side note: If your APIs and libraries are written in Java, and already record metrics that your developers used for debugging, and Unit Testing, you can expose these directly via JMX, which can be accessed and used directly if your Synthetic Transaction Agent process(es) are also written in Java. Or you can create a separate function or API that returns the internal metrics recorded by your libraries, frameworks and API deployments.
A number of years ago, I developed a Performance Metrics object model and small set of helper functions for Java that I have been using for over a decade and I find that even today they are still the most useful performance metrics I can collect. Perhaps I will write up an article on collecting performance metrics in the applications you develop and share that simple object model and helper functions.
Automated alerts, such as paging the on call support staff could also be accomplished by simply specifying how many seconds or milliseconds a call to an API should take, and if that period is exceeded, the Agent would send out emails and paging alerts.
In the end a lot of organizations have a Global Technology and Architecture Principal that mandates all their applications have some sort of automated system testing.
This can be accomplished by using the Synthetic Transaction paradigm.
It is worth noting that creating an architecture that supports Synthetic Transactions is not simply. You need to ensure that all components, especially your data and information models allow for test records.
A way around the information model requirement is to Rollback all transactions on your database instead of committing them. This would force you to have a flag or special API separate from the normal data flow in your system to ensure data is not permanently written to your database. However, the issue here, is if you implement it this way, you cannot have a true end to end flow in production of test records. Still you will be able to get most of the metrics you need.
Also if you organization only mandates a certain level of automated testing or performance and availability monitoring, than perhaps true end to end data flow through your system is not required.
It is my experience however, that even if my company I work for does not mandate true end to end testing, as a responsible application owner, I prefer to have the capabilities to have true end to end data flow testing available to me, so I can monitor my systems more accurately and give proper answers to stakeholders when users and client systems complain about performance or system availability.
Just Another Stream of Random Bits…
– Robert C. Ilardi
Back in 2005, I was face with developing a Secure Set of APIs that could run in multiple deployment configurations. At the time we were heavily developing EJB’s, specifically Stateless Session Beans. We were also starting to deploy SOAP based Web Services, and we were also packaging these same APIs in the form of standalone Libraries.
On a side note this will be my first article on Information Security Topics and developing Secure Applications. I recently have become increasingly interested in Penetration Testing and other Information Security topics, and I am even enrolled in classes and other forms of training. I have created the Security Category on my blog to organize security related topics on this web site. Hope they will help all of us create more secure applications.
Combined with what I call the Data Services Architecture and the Resource Bundle / App Resource Manager framework, I was able to create an architecture leveraging Factories, Mediators, Data Access Objects, and Facades to hide from the calling clients which “Mode” the APIs were running in, whether it was EJB, Web Services, or simply running from a Locally deployed Library on the classpath.
I was faced with the challenge of ensuring that no matter which operating or deployment mode these APIs which numbers in the hundreds of individual API methods, were all secure. Not only did a calling application or user have to Authenticate with a Single Sign On Services provided by the firm, I also needed to create an Entitlements framework that would allow fine grain, down to the individual method level Authorizations for each API.
As any good Developer that has any exposure to basic Information Security and Defensive Programming Techniques, this means that we only want to login once, so that we do not have to pass credentials to each API we call, and in doing so the established design for doing so is the assignment of a Securely Randomized Unguessable Session ID. This ID does NOT have to be the HTTP Session ID, which in the case of my requirements was only available technically when developing the Web Services.
Also, depending on your Application Server configuration and firm standards, you probably are running on a multi node cluster and some load balancers do not work very friendly with HTTP Session Replication and again depending on firm development standards they make not even allow you to turn on Session Replication. And some may even have a requirement NOT to turn on session stickiness.
My solution was to develop two components, one is called the Stateless User Cache, which is responsible for creating and management Sessions across Clusters of Application Servers without App Server Session Replication, and also allows for the use of this Stateless User Cache to operate correctly in Standalone locally classpath deployed environments such as Library Mode.
We will go over Stateless User Cache in a future blog article in more detail, but I wanted to mention it hear, because it is tied to the Lightweight User Reference Object.
So basically I provide an API usually called ssoLogin which wraps the firms Single Sign On Service, whether its something like authenticating against LDAP or Active Directory, or something like a vendor product such as Site Minder.
The ssoLogin method will NOT return a User object which contains all entitlements, but instead will leverage the Stateless User Cache to create a new “Session” store the User object in that session, and return a “Reference” or “Pointer” object to that session.
In this case you can thing of it as an Object form of a Session ID.
The Class looks something like this:
public class UserRef implements Serializable
private String sessionId;
private long loginTimestamp;
private long lastTouchTimestamp;
private String userId; //Insecure if the user id is Private, see notes below.
//Getter and Setter methods…
//HashCode and Equals methods…
Basically as you can see the UserRef object provides 3 to 4 bits of information. The fourth, being the userId, can be the username or a unique surrogate key or even better a transient key that does not map to the real database stored user id.
However it can be the real username or surrogate key depending on the application security requirements. Let’s take for example the case of a Instant Messaging Application. The Username is public information an it makes sense for the client to have a list of usernames the currently login user has on their buddy or friends or contact list. In this case there is no real secure issue for storing the username in this field because it is public shared information.
However in applications where usernames and ids are not required or never needed to be shared, that we should leave this field Null, or remove it from the UserRef object itself.
One advantage to having the userId in the UserRef is if you have the same user or application logging in more than once, and you want to tie together different Session Ids to the same user, and for whatever requirement you have, it is needed by the client to be able to lookup the other sessions or in some way communicate with the other sessions.
Now as a side note, technically this user id whether real or transient and secure generated and mapped on the server side to the real underlying user id, does not need to be send back to the client. The unique session id is good enough, and you can store the user id for same user owned session ids on the server side, which is much more secure, but I have found by experience that in my enterprise applications sometimes I need to expose the username or user id to the client side, and I usually do this through UserRef. Again you need to perform Security Use Cases to determine if having this bit of information opens any vulnerabilities on your applications and any potential exploits can be created to take advantage of that vulnerability. One vulnerability this make open up is Username recon and collection and potential Spear Fishing attacks, or User Id enumeration if the Ids are insecurely generated such as simple sequence numbers.
In any case, the UserRef with at minimum the sessionId field is required, and the other information can be added or removed as you require for your applications, however the more the client side needs to do without communicating with the server, especially if the API suite is used by not only Web Applications but Desktop Applications, or perhaps Batch Applications and Server side Daemon Processes, the more information you may need to include in your lightweight User Reference Object.
The next step is to required all developers on your team to include the mandatory field UserRef in all their API methods as a required Parameter.
Than you can a combination of the Stateless User Cache if you have something similar to it, or the HTTP Session to use the UserRef object as the Key to lookup the full User object which contains User Entitlements.
In a future post I will do a write up on my Entitlements Object Model so you can see how I store Entitlements or Authorization information in memory.
Usually I will create methods such as public boolean hasAccess(UserRef userRef, String apiName) throws AppSecurityException;
And require all my developers to ensure in the Mediator or Facade code that hides all Data Access Objects and other Service Handler objects to firm check if the user has access to the method by ensuring they make a call to “hasAccess” first.
Its easy to do a code review or even write a script that automatically scans your source code to ensure every method has a call to hasAccess.
One important node is, of course the login method, in this case “ssoLogin” would normally be the only method that does not make a call to hasAccess as all users should have implicit access to this method, and even users that do not exist in your security databases or LDAP directories, will simply get a Login failed message.
Remember do not give potential hackers hints, if they guessed a username correctly. Instead use the generic Login Failure Message: “Username and/or Password are invalid.”
In this case the system does not give them a hint whether the username actual exists or if they simply got the password incorrect.
Finally the since the UserRef object is small, it has a smaller impact on I/O when transferring the object remotely via EJB or Web Services calls. Much smaller I/O footprint then passing the entire User object, which besides being highly insecure, also can be a performance issue.
Let me know what you think of my User Reference Object and solutions to securing APIs or for that matter any method you want secure. I would love to hear from Developers and Penetration Testers alike!
Finally, and I will probably write an entire post on this, but you can find plenty of information out their on the web. Make sure when you generate your own session id, you use secure randomization so the Session Id token is unguessable and incapable of being enumerated through a simple algorithm.
In Java there’s two very simple solutions, use the SecureRandom class and NOT the Math.random or Random class, or you can even use the UUID class to create a globally unique identifier.