Sunday, March 13, 2016

Learning Apache Hadoop OREILLY Course

we should know the concept of Disk Stripping, in Disk Stripping or RAID0, the data is divided into multiple chunk, so lets say you have 4 hard Disks, data will be divided into 4 pieces, by that accessing data will be much faster,

in RAID1, we do mirroring for the data

in order to ensure that your data is safe you should combing RAID1 with RAID0.

Hadoop logically does that in the cluster, it stripes and mirror data.

- Hadoop is Fault tolerant, it means if a disk is couropted or a network card not working, this is fine.
- Hadoom has master slave structure.

MASTER NODE COMPUTER FEATURE
you should choose a powerful and expensive computer for your master node.
master node is a single point of failure,
you should have 2 or 3 Master node in a cluster
you should have redundancy ( as it is SPOF)

You need alot of RAM, more than 25 as the deamon takes alot of ram
you should use RAID
you should use HOT SWAP Disk drive
you should have redundant Network card
you should have dual power supply

the bottom line, the MASTER NODE should never goes down.

CPU is not important like RAM here.

SLAVE NODE COMPUTER FEATURE
you will have 4 to 4000 slave node in a cluster
slave nodes are not single point of failure.
7400RPM disks are fine
more disks are better, which means 8 * 1 TB data is much better than 4 * 2 TB
it is better that all slaves have the same disk size.

sure slave IS NOT Redundant
you dont need RAID or Dual network card or Dual power supply

you need alot of RAM

SLAVE SIZE EXAMPLE
lets say you have 10TB of data every month
you have slaves with 8TB
you have replication factor of 3
you should know that you have something called "intermediate data" which is the generated data betwee MAP and REDUCE. this data is about 25% of the disk size ( in this case 2TB )

the avaialbe space formula is = (RAW - ID) / RF = (8 - 2)/3 = 2 TB

which means each slave has 2TB not 8TB, which means you need 5 slaves every months (as you have 10TB every months).

HADOOP ECOSYSTMES
it means all the things on top of hadoop,
basically when we say hadoop we main HDFS and MAPREDUCE.
main things in hadoop are
1- NAME NODE: part of the master
2- Secondery Name Node: part of the master
3- Job Tracker: part of the master
4- Data Node: part of the slave
5- Task Tracker: part of the slave

SOME ECOSYSTMES
1- HBAS: fast scalable NoSql database
2-HIVE: write sql like queries instead of map reduce
3- pig: write functional queries instead of map reduce
4- sqoop: pull and push data to RDBMS, used for integration
5- flume: pull data into HDFS
6- HUE: web interface for users
7- Cloud Manager: web interface for managing the cluster for admin
8- oozie: workflow builder
9- Impala: real time sql queries, 70 faster that MapReduce.
10-Avro: serialize complex object to save in hadoop
11- Maheut: machine learingng in hadoop
12- Zoo Keeper
13- Spark
14- YARN
15- Storm

NOTES
hadoop is used for batch processing, which means parallelization, which means problems like graph based doesnt fit with hadoop

SOME HADOOP SUPPORT COMPANIES:
the best is Cloudera
Hortowrks
MapR


//////////////////////////////////////////////////

when we talk about Hadoop, we are talking about 2 main things
1- storage: whcih is HDFS, a distributed redundant storage
2- processing: which is MapReduce: a distributed processing system

some terminology to know:
1- a job: all tasks need to run on all data.
2- a task: individual thing, which is either a map or a reduce
3- Slave/Master: these are computers
4- NameNode, DataNode: these are daemons, which means JVM instances


we have MapReduce v1: old and stable
we have MapReduce v2: new things like dynamic allocation and scalability


Hadoop cluster has 5 daemons:
- Storage Daemons:
NameNode(on Master)
SecondaryNameNode(Master)
DataNode(Slave)
- Processing Daemons:
JobTracker(Master)
TaskTracker(Slave)


Master Daemons are for orchestration
Slave Daemons are for working

NameNode: Handle Storage meta data, it puts some information in Memory for fast access but also it persist data.
Secondary Name Node: it checks NameNode if it is alive or not, it is not a failover node
Job Tracker: coordinate processing and schedualling.


NOTE: use different machines for Name Node and Secondary Name Node, because if the machine is down, the Secondary Name Node will detect that and build a new Name Node directly

NOTE: you can install the job tracker on the same machine with the Name Node, and move it to another machine when your project gets bigger.

Data Node: handle row data (Read & Write)

Task Tracker: handle individual taks (Map or Reduce)

data node and task tracker always sends heart beats to the master to tell him that we are alive and we are working on this.


///////////////////////
Hadoop run modes

1- Local JobRunner: Single computer, single JVM with all daemons, good for debugging
2- pseudo Distribution: Single computer, 5 JVM (one for each daemon), good for testing
3- Fully Distributed: Multiple computers, multiple JVM, this is the real environment.

when you install Hadoop it is recommended to use linux, use RHEL for Master and CentOS for slaves.

use Redhat Kickstart to install hadoop on multiple machines.

///////////////////////////////////////
Elastic Map Reduce

is a solution from Amazon similar to hadoop.

it has this structure:



the master instance group: is like the master node
the core instance group: is like the slave node, but it is only responsilbe for storage ( as you can see it uses HDFS
task instance groupe: is like the slave node, it is only responsible for processing ( doing map reduce job)

usually we use S3 to write information and intermediate data.

Core instance group is static, you cannot add any new machine after you start the cluster, however the task instance group is not static, you can add new machine whenever you want

//////////////////////////////////////////

HADOOP Lap 1
in this lap he created 5 EC2 instances, one is a master and 4 slaves
he installed cloudera manager
he installed Hadoop from cloudera manager
then he uploaded some data to Hadoop from the command line
then he ran a Map/Reduce example
then he checked everything from Cloudera manager

then he gave an example of  downloading Cloudera Quickstart VM locally, to install hadoop locally

HE used Ubunto 12.04 AMI

///////////////////////////////////////////////////////

Hadoop Distributed File System (HDFS)

you can use HDFS without MAP/REDUCE, in that case you only need NameNode, SeconderyNameNode, DataNode

when you upload a file to HDFS it will be divided into blocks and stored in slaves nodes

every block will be replicated to 3 machines (by default)

you cannot edit or append the file you upload to HDFS, if you wanna change anything you should delete and create the file again.

the default block size is 64MB, however it is recommended to change it to 128MB

NAME NODE:
it is a master node
it has only metadata information about the files that are stored in slaves (e.g. name of the file, permessions, where the blocks exist).

IF YOU LOSE YOUR NAME NODE YOU LOSE YOUR HADOOP SYSTEM

the client asks the name node about the file then the client goes and read it directly from the slave node.

The name node metadata exists in RAM, however it is also persisted.

we have 2 files for the persisted metadata in name node:
1- FSIMAGE: it is a point in time image about the information that exists in HDFS
2- edit log: the changes that happened since we created the FSIMAGE, it stores the delta information

every now and then FSIMAGE and edit log will be merged and saved on the hard disk

you have to have multiple hard disks with RAID to insure that you will not lose the data.
it is also better to use remote NFS.
and daily or weekly backup.

HEART BEAT:
every 3 seconds the datanode will send a heart beat to the name node
if 30 seconds passed without a heart beat, the node is out
if 10 minutes with no heart beat, hadoop will start copy the data that should be on that node to another machine.

every one hour (or after the restart of the name node) all data nodes will send Block report, which is a list of all blocks that they have.

Hadoop uses checksum to insure that data is transfered correctly.
every 3 weeks hadoop will do general checksum check on all blocks.

//////////////////////////////////////

How writing Happen in Hadoop


here is an example



so the client divided the file into 4 pieces ,
he asked the name node to write the first piece,
the name node gave a pipeline which is: write to datanode A then c then F
the client write to A, then A write to C, then c write to f
F ack C, C ack A, a Ack the client, the client ack the nn and request a pipline for the next block.


how do we handle a failed node,

lets say DN_A is bad, the client will try with C, if not with F.
as long as the client is able to write into one node the client can move to the next block

general information:
1- checksum is used for each block
2- the file is considered as the number of written blocks, so lets say your file is 4 blocks and you wrote only 2 blocks, so to this point your file is only 2 blocks, and HDBS will see your file as 2 blocks.
for that it is better to have 2 folders, INCOMING: keep here the file that is under upload process, once you finish uploading the whole file, move the file to READY_TO_PROCESS folder.


How reading is handled

the client ask for a file, the Name Node also gives a read pipline for each block

//////////////////////
Secondary Name Node

as we mentioned before, we have 2 files in the NameNode, fsimage which is a point in time file and edit log which is delta since the last fsimage

Note: we have 2 files, fsimage and edit log, because fsimage is a big file, opening a big file will slow down hadoop, thats why we have edit log, a small file and contains only delta information, using edit log means dealing with a small file ==> better performance

SECONDARY NAME NODE COPIES FSIMAGE AND EDIT LOG AND MERGE THEM TO PRODUCE A NEW FSIMAGE THEN MOVE IT BACK TO THE NAME NODE

SECONDARY NAME NODE IS NOT A HOT FAIL OVER

IF SECONDARY NAME NODE IS DOWN nothing will happen, the name node will keep writing on the edit log, the edit log will become bigger and bigger and the system will become slower and slower.

/////////////////////////////////////////////////////

new lab, we used hadoop fs -put

when you do the instalation with cloudera manager, a trash directories will be created for you by default, when you delete something it will be moved to the trash directory.
if the directory is not created, it is recommended to create one.

////////////////////////////////////////////////////

High Availability Name Node
Name node as single point of failure is not acceptable,
thats why we have a new solution by cloudera which is intrduced in Hadoop 2, and called Name Node high availability.


as you can see, the Standby namenode will take over if the name node is off AND YOU DONT HAVE TO START THE WRITE OR READ OPERATIONS FROM BIGINNING .

NOTE IMPORTANT: Clients send all operations to both the NN and Standby NN, both of them have complete picture of what is happening in the memory.


With the architecture above, i can handle the failure of the NAME NODE, however the fsimage and edits log are still a Single Point of Failure.
that is why High Availaibility Structure introduced a new thing called JournalNode.

 the current active name node now writes, synchronously, the fsimage and edits log to set of journal nodes, the standby NN reads from these nodes

in order not to the nameNode and stand by name node misunderstood each other (maybe one think that it is the active name node now). they use something called epoch number with each write to the Journal Node


HOW WE INSURE AUTOMATIC FAIL OVER

we use a cluster of Zoo keepers to determine who is the active name node (the number should be odd to avoid brain split).
as you can see we have ZKFC service in NameNode and StandBy Node, they send information to the ZooKeeper cluster to tell about the health of the node,
if the NAME NODE ZKFC noticed that the NameNode is down, he will send this information to zoo keeper, zoo keeper will set the stand by node a the active name node and the old name node as the standby one

HIGH AVAILABILITY IS COMPLICATED
as you can see HA is complicated, extra machines, extra configurations ...
you dont need this most of the times, the secondery name node on a different machine is usually enough.

NAME NODE FEDERATION

scale name node functions by breaking up namespaces to multiple machines.

HDFS ACCESS CONTROL
hadoop has authorization, but it doesnt have authentication, for example lets say you are sending a write request to machine1 as user xxx, user xxx is not authorized to do write operation but user yyy has, simple create user yyy and send a request as yyy, hadoop will not check that you are yyy for real.

to do Authentication you should use something else, Kerberos.

hadoop uses linux like permissions.

////////////////////////////////////////////////////////////////////////////////////////////////

MAP REDUCE

these are the player of map reduce



and here is how the job is done




we have also new version which is called MapReduce v2, in this version they focus on the scalability of the job tracker and removing a restriction on the number of the map and reduce jobs that can be run on each slave machine.

the Map Reduce configuration files are:
1- mapred-site.xml
2- hadoop-env.sh

MAP REDUCE LAB
in this lab he gave an example how to run a java map reduce function


this is the statement to run a map reduce, hadoop-examples.jar contains the Map and Reduce java classes.

he went over everyline of code, you can check it.

How MapReduce works in detailes

so to summarize, job tracker asks name node where the blocks are, it assigns some slaves to do map jobs, then it assigns one or more slaves to do reduce job, the reduce task trackers WILL COPY THE OUTPUT OF MAP TASK TRACKERS TO THEIR LOCAL MACHINES.

SPECULATIVE EXECUTION


Hadoop is Rack Awareness




///////////////////////////////////////////////////////////////

Advanced MapReduce, Partioners, Combiners, Comparators, And more

 firstly we should know that the Mapper and Reducers do some kind of sorting


The mapper sort the keys, and the reducer after the shuffle it also sort by the keys.

You can define a Comparator to do secondary sorting to sort the value in the Reducer, so in the example above we have us:[55,20] the secondary sorting will sort it to us:[20,55].

also we can define what we call a combiner, which is a pre-reducer, the combiner will run in the Map face, as you can see in the example above, the first mapper adds the US values and the output was 55, this is the combiner job.
With combiner you may reduce the processing time and the intermediate data.

we also have something called partioner


the mapper can partition its output to multiple partitions, and later the reducer can fetch the partion that it is intrested in,
in the example above we did a partition by key, and as you can see each reducer grabs a specific key.

There is a full example about writing a Partitioner.

///////////////////////////////////////
LOGGING



for unit testing you have MRUnit  which is a new apache project.

he gave a practical example about loggin as well

TERASORT 
when you do benchmarking we talk about terasort number, then number will give us an indecator about the performance of the cluster, and weather adding new machine gave us a gain in performance.

TERASORT is simply a simple or lets say the simplest mapreduce job hadoop can do. to do a TERASORT test you should use 3 scripts
1- teragen: to generate a dataset
2- terasort: it is a job that sorts the dataset.
3- teravalidate: it is used to validate if the dataset got sorted.


///////////////////////////////////////////////
Hive vs Pig Vs Impala

we know Hive and Pig, we know that they are simply converting your requests to MapReduce requests.
they are in general 10-15% slower than a native java mapreduce.

as Hive and Pig converts the requests to MapReduce, they use the job tracker and task trackers

Impala is developed in cloudera, they are designed for real time queries, they use specific daemons for them, not the task trackers and job trackers. IMPALA DOESNT USE MAPREDUCE AT ALL.
Impala is not fault tolerant. Baisclly MApReduce is slow becuase of the time we need to start the jvm for map reduce jobs. Impala uses its own deamons.
Impala is on top of Hive, so it uses Hive (actually it is a sub set of HiveQL)


////////////////////////////////////////////////////

HIVE

in HIVE, you can do the installation on each client and start calling.


or, you can have a HIVE server:



we always need a metastore, where we store the mapping between HIVE tables and HDFS data.

NOTE: in HIVEQL there is no update or delete, as HIVE runs on top of Hadoop and as we mentioned before you cannot delete or update a record.

Check the HIVE & PIG LAB.

CHECK LIPSTICK PROJECT FROM NETFLEX TO CHECK PIG PERFORMANCE


/////////////////////////////////////////////////////////////


Data Import and Export

we have 2 types of import and export:
1- Real Time Ingestion and Analysis:
products like Flume, Storm, Kafka, and Kinesis
the idea of these product is that you have multipel agents who push and pull data from each other.

these system doesnt care if the end system is Hadoop or NoSql or a Flat file

The products are similar, however Storm, Kafka and Kinesis has more Analysis functionality than Flume

2- Database Import Export:
Sqoop (SQL to Hadoop)
it is simply a single process that import/export data to/from hadoop.

there is no analysis or filtering or.. just import export.

you can do something like on 2:00 pull  all data from hadoop and put it in table xxx.





FLUME

Flume is used to move massive amount of data from system A to System B (which is usually HDFS, MongoDB, NoSQL ...)

He talked about the architure of FLUME and there is a LAB.


///////////////////////////////
HttpFS VS WebHDFS



some REST call examples


FuseHDFS



SQOOP
he gave a lab about sqoop

Oozie
Oozie is used to build a workflow, the workflow is represented in XML format



Tuesday, March 8, 2016

EJB 3.1 cookbook

Chapter 1: Getting Started with EJBs


Creating a simple session EJB





simple example with @Stateless annotation.

very important is that @Stateless takes a parameter


mappedName is used as a JNDI name.


Accessing a session bean using dependency
injection



as you can see, we inject the bean using @EJB.

and we used here @WebServlet to define a servlet.

Accessing the session bean using JNDI



as you can see, we use InitialContext() and lookup() to find the bean.

we used this path in the code
"java:global/SalutationApplication/SalutationApplication-ejb/Salutation"
which is 
java:global[/<app-name>]/<module-name>/<bean-name>

java:global: means search in all beans that are globally accessable
java:app: means search in all the beans that can be seen in the same application
java:module: means search in all the beans that can be seen in the same module

IMPORTANT: the bean can be packaged in application-ejb.jar, or application-war.war. the application-ejb.jar can be packaged inside application.ear.

so the[<app-name> is when you are packaged inside application.ear
the module-name is the name of the war.war or ejb.jar
the bean-name is the name of the bean.

so if you want to search for beans inside a module :
java:module/<bean-name>

searching inside the application:
java:app/<module-name>/<bean-name>

we will see later that the bean could implement a localInterface and/or RemoteInterface

public class Salutation implements SalutationLocalInterface, SalutationRemoteInterface {
}

in this case you JNDI lookup will be

java:global[/<app-name>]/<module-name>/Salutation/SalutationLocalInterface
java:global[/<app-name>]/<module-name>/Salutation/SalutationRemoteInterface



Creating a simple message-driven bean


as you can see we use @MessageDriven, we implement MessageListner and override onMessage.

mappedName: is the name of the queue that we are gonna listen to
and we added some config which are the acknowledgeMode and the destinationType.


Sending a message to a message-driven bean


as you can see, you need a queueConnectionFactory and a queue.
we create a connection
then we create a session
then we create a producer
then we send the message.


also you can see that we injected the resources (connection factory and queue , which you usually create them in Glassfish) using @Resource 

Accessing an EJB from a web service
(JAX-WS)

firstly we will define a singlton bean


simply we use @Singleton

then we will use this bean inside JAX-WebService


as you can see we use @WebService and @WebMethod to define a service.

and sure to inject the bean we use @EJB.

Accessing an EJB from a web service
(JAX-RS)


firstly we will define a Stateless bean



then we can define


as you can see we define @Path, @GET, @POST ...
and sure to inject the bean @EJB is used.

Accessing an EJB from an Applet


you can access a bean from Applet check the example if you want 

Accessing an EJB from JSP


in this example we will create a Remote Stateless EJB.

to do that, firstly we should define the remote interface:


then we implement the interface in the bean class



no we will use InitialContext to get the bean in JSP.



Calling an EJB from JSF


calling an EJB from JSF is similar to JSP, however before EJB 3.1 we had to define what we call a managedbean which is like a wrapper to the EJB.

in this example we will see how to define a managedbean.

firstly we will define the bean


@Named is similar to @Component in Spring

then we define the managed bean


as you can see the managed bean is just a wrapper for the actual bean

now we can use the bean like this


Accessing an EJB from a Java Application
using JNDI


accessing EJB from Java Application can be done easily by using JNDI



Accessing an EJB from a Java Application using an embeddable container

you can access the EJB using what we call embeddable container, The embeddable EJB container allows EJBs to be executed outside of a Java EE environment.

the code looks like this


Accessing the EJB container

EJB needs to access the container, which means access it to use its services (security, transaction ...).

accessing the container happens through EJBContext Interface.


as you can see we defined SessionContext (which implements EJBContext) and annotated that with @Resource

we have 3 context:

SessionContext for Session Beans
MessageDrivenContext for MDB
EntityContext for an Entity

EJB 3.1 cookbook Chapter 2: Session Bean 2


Session bean has 3 types: Stateless, stateful and Singleton

Stateless: no state
Statefull: keep the state between callse
Singleton: IT IS STATEFULL BUT WE HAVE ONLY ONE PER APPLICATION

Beans can be access locally (No Interface or Interface) or Remotely 
when access locally you should be in the same JVM
parameters will pass by reference when locally, and by value when Remotely.

Creating a stateless session bean




as you can see we use @Stateless, @LocalBean (there is no need to use this one, it is the default value).

The lifecycle of Stateless bean has @PostConstruct and @PreDestroy

Creating a stateful session bean




@Stateful
@PostConstruct
@PrePassivate
@PostActive

Creating a singleton bean



@Singleton
@PostConstruct
@PreDistroy

Using multiple singleton beans




as you can see we used
@Singleton
@Startup: which means initialize the bean as soon as the application starts up
@DependsOn("BEANNAME") it means that PlayerBean should be initialized before this bean
you can wirte @DependsOn("x","y","z")


Using container managed concurrency



by default the container is responsible for handling the Singleton bean concurrent requests, only one client can access the bean at a time whether for read or write

you can change the concurrency behaviour by using @ConcurrencyManagement and @Lock


as you can see we say here that the container (which is the default behaviour) will take care of handling the concurrency, and we limit getState() to just Read lock and setState() to write lock

you can specify also the timeout for a lock by using @AccessTimeout(5000) 

Using bean managed concurrency



with bean managed concurrency, you should handle everything by using synchronized key words.



Using session beans with more than one business interface


you can use multiple interfaces with a bean, 





we used @Named just to use it with JSF


Understanding parameter behavior and granularity



we know that local beans runs on the same JVM and remote beans on different JVM

when you have a class with multiple private variables, then you need one call to get each variable instance ==> in case of local beans, this is fine as we are doing local calls, however in case of remote beans this is a lot of overhead. (FINE GRAINED APPROACH)

you can also pass the whole object ==> only single call, this is good in case of remote call (COARSE GRAINED APPROACH)

we know that passing objects between JVM we are passing by value, it is a good practice to make the object immutable.

lets take an example about fine grained:

we have this interface which represent the Orbit


the implementation for this interface


as you can see, if you want any value from this remote bean you should make a call, so you need to make 6 calls to get all the Orbit informations


in the example above we made a call to get the Eccentricity value, if you want to get the Longitude you should do position.getLonituteof...().

alot of call.


however if you go with the Coarse grained fashion, you can define the remote interface like this


as you can see just one method that return an object


as you can see it return an object

and you can do the call like this


as you can see orbitalElements.getPosition() will return the whole object in one call, no need for other remote calls.
now when you do getEccentricity() you are doing a local call

Using an asynchronous method to create a background process



if you want the bean to run asyncronusly so you dont have to wait for the results, you have 2 options:
1- Invoke and Forget: which means you run the bean and you dont care about the results
2- Invoke and Return in Future: which means you run the bean, the bean will store the results in a Future object which you can access later.

we will see the 2 approaches in this example:


as you can see we use @Asynchronous with printAndForget().
and we return Future<String> in case we want a future object,
as you can see we return new AsyncResult<String>()

to use this bean:


as you can see we do futureResult.get() to get the results and you should handle the exceptions

NOTE: Future object is not just used for getting the results, you can use it to cancel the task, check if it has completed and other thigns


EJB 3.1 cookbook Chapter 3: Message-Driven Beans 3


we know everything about Message Driven Bean, we will start by example directly

Handling a string-based message


for string based message, simply you can write

and you can read the message


Handling a byte-based message



and reading from queue


Handling a stream-based message


and reading


Handling a map-based message




and the read

Handling an object-based message


and you read that


Using an MDB in a point-to-point application


all the previous examples where point to point where we have the following architecture


also consider this architecture, sometimes it is better to have a chain of queues


Using MDB in a publish-and-subscribe application


in case of public subscribe we should create a Topic.


this is how to receive the message, and as you rmember Durability means that the message will stay in the Topic if the subscriber is offline.

and here is how we send a message


Specifying which types of message to receive using the message selector




now to read this specific type of messages



Browsing messages in a message queue


 you can use queuebrowser to browse the queue.


EJB 3.1 cookbook Chapter 4 & Chapter 5 & Chapter 6 : EJB Persistence & JPA Query & Transaction Processing

An entity is a class representing data persisted to a backing store using JPA. The @Entity annotation designates a class as an entity

Entities can also be declared in an orm.xml file

The persistence unit defines mapping between the entity and the data store
The persistence context keeps track of the state and changes made to its entities
The EntityManager manages the entities and their interaction with the data store

Creating an entity



@Entity: for defining an entity
@ID: for setting the primary key
@GeneratedValue: to generate primary key value

Creating an entity facade



the idea is to crate an abstract facade with general functions:


now you start implemnting this facade in your beans


Using the EntityManager


now what you can do is using the bean defined before like this:


Controlling the Object-Relationship Mapping (ORM) process


you define database information and persistence unit in persistence.xml


we have annotations like
@Table
@Column


Using embeddable classes in entities


you can embed entity withen another, and they will be in the same table.

for example, you can have an employee class


and you can define the Address class like this


with @Embaddable 


the table in database will be a single table (EMPLOYEE) with employee and Address fields.

Validating Fields



@NotNull
private String name;

@Null
private String name;

@Size(min=12)
private String name;

@Size(min=12, max=36)
@NotNull
private String name;

@Temporal(javax.persistence.TemporalType.DATE)
private Date dateOfBirth;

@Past // the value should be in the past
@Temporal(javax.persistence.TemporalType.DATE)
private Date dateOfBirth;

@Future // the value should be in the future
@Temporal(javax.persistence.TemporalType.DATE)
private Date dateOfBirth;

@Pattern(regexp="\\d{5}(-\\d{4})?")
private String zipCode;


@AssertTrue// this means resident should be true
private boolean resident;

@Min(12)
@Max(48)
private int monthsToExpire;

you can also use a Validator class to do the validation



Chapter 5 JPA Query



nothing much here we will just add few examples

1- create and run a query
@Override
public List<Patient> findAll() {
Query query = entityManager.createQuery("select p FROM Patient p");
List<Patient> list = query.getResultList();
return list;
}

2- control the number of returned entities
Query query = entityManager.createQuery("SELECT p FROM
Patient p");
query.setMaxResults(querySize);
query.setFirstResult(beginIndex);
List<Patient> list = query.getResultList();

3- delete query
public int delete(String firstName, String lastName) {
Query query = entityManager.createQuery("DELETE FROM Patient p
WHERE p.firstName = '" + firstName + "' AND p.lastName = '" +
lastName + "'");
int numberDeleted = query.executeUpdate();
return numberDeleted;
}

4- update query
 public int updateDosage(String type, int dosage) {
Query query = entityManager.createQuery("UPDATE Medication m " +
"SET m.dosage = " + dosage + " WHERE m.type = '" + type + "'");
int numberUpdated = query.executeUpdate();
return numberUpdated;
}

5- use parameter in query
public List<Patient> findByLastName(String lastName) {
Query query = em.createQuery("SELECT p FROM Patient p WHERE
p.lastName = :lastName");
query.setParameter("lastName", lastName);
List<Patient> list = query.getResultList();
return list;
}

6- using named query
@Entity
@Table(name="MEDICATIONS")
@NamedQuery(name="findByType",
query="SELECT m FROM Medication m WHERE m.type = ?1")
public class Medication implements Serializable { ...}

public List<Medication> findByType(String type) {
Query query = entityManager.createNamedQuery("findByType");
query.setParameter(1,type);
return query.getResultList();
}

7- Using the Criteria API

public void findAllMales(PrintWriter out) {
CriteriaBuilder criteriaBuilder;
criteriaBuilder = getEntityManager().getCriteriaBuilder();
CriteriaQuery<Patient> criteriaQuery =
criteriaBuilder.createQuery(Patient.class);
Root<Patient> patientRoot = criteriaQuery.from(Patient.class);
criteriaQuery.where(criteriaBuilder.equal(
patientRoot.get("sex"),"M"));
List<Patient> patients =
getEntityManager().createQuery(criteriaQuery).getResultList();
for (Patient p : patients) {
out.println("<h5>" + p.getFirstName() + "</h5>");
}

CMTs can be used with session beans, message-driven beans, and entities. However, BMTs
can only be used with session- and message-driven beans.

CHAPTER 6 Transactions



you have either container managed transaction or bean managed transactions.

by default we have container managed transactions.

Using the SessionSynchronization interface with session beans


if you implement SessionSynchronization interface you can use functions like afterBegin, beforeCompletion, afterCompletion


we have something important which is called TransactionAttributeType which you can set for methods or classes

REQUIRED – Must always be part of a transaction
REQUIRES_NEW – Requires the creation of a new transaction
SUPPORTS – Becomes part of another transaction if present
MANDATORY – Must be used as part of another transaction
NOT_SUPPORTED – May not be used as part of a transaction
NEVER – Similar to NOT_SUPPORTED but will result in an EJBException being thrown


A Message Driven Bean (MDB) only supports the REQUIRED and NOT_SUPPORTED values.

usually the transactionAttirbuteType is set on the method level, it defines how the method will behave in case there is a parent transction or not


Handling transactions manually


firstly


then you should start and commit transactions by your self


NOTE: 

Rolling back a transaction


in case of bean managed transaction you can rollback using these methods
UserTransaction.rollback(): which cause an immediate rollback of the transaction
SessionContext.setRollBackOnly(): which marks the transaction for rollback however the transaction will not be interrupted it will continue to the end.

in case of container managed transaction you can only use setRollBackOnly().

Handling errors in a transaction


If an unchecked exception is thrown, a transaction is automatically rolled back. For checked exceptions, the UserTransaction's rollback method or the SessionContext's setRollbackOnly method are used to explicitly force a rollback.

when you define an exception you can set if it should rollback or not.

Using timeouts with transactions


if you are using container managed transaction you can change the transaction timeout from the container GUI, 


for Bean managed transaction you can use.
UserTransaction.setTransactionTimeout(10);

EJB 3.1 cookbook Chapter 7 EJB Security

you can use annotation to secure access to methods, or you can use some code, use the code when the annotation cannot do what you want (e.g. access is allowed only in the morning).

When we talk about security we talk about REALM, users, groups and roles.
we define the REALM and under it the users and groups in the JAVAEE server (e.g. glassfish), usually they have a GUI for that.



the roles are defined on the application level, we assign the roles to groups and users.



as you can see we defined the roles and security-constraint in web.xml

mapping roles to groups and users should be done in (sun-application.xml, sun-web.xml, or
sun-ejb-jar.xml) depending on how the application is deployed


now when you write code



as you can see you define the roles that the class will handle, then you use @RolesAllowed, @PermitALL and @DenyAll.


Sometimes you need a class to run in a higher Role, so lets say that you have an employee Role and you want to call something which needs a Manager Role, the class itself can allow you to do that by using RunAs annotation


How to control security dynamically
after the user is authenticated by JAVAEE Server, it will be represented as Principal object as part of context, you can use this object for programmatic access.
some of the calls that you might need

Principal principal = sessionContext.getCallerPrincipal();
principal.getName()
sessionContext.isCallerInRole("manager")


EJB 3.1 cookbook Chapter 8 Interceptors 

To use interceptors:
1- define your interceptors class


2- specify where you want to use this interceptor

as you can see here the interceptor is on class level


3- you can define multiple interceptors like this:



4- when you define an interceptor on class level, it will be applied to all methods, if you want to exclude some methods, you write:


5- you can also define an interceptor on method level


6- you can define an interceptor for all EJBs.
you do that by adding <interceptor-binding> in ejb-jar.xml

7- as you can see, in the defined Interceptor method we have, a parameter which is InvocationContext, this parameter has useful methods like:


8- you can annotate methods also with @PreDestroy and @PostConstruct @PrePassivate and @PostActivate

EJB 3.1 cookbook Chapter 9 Timer Service & Chapter 10 Web Services & Chapter 11 Packaging EJB & Chapter 12 EJB Techniques 

To schedual a method to run at specific time



you can also create an event programatically
1- define a time service resource
@Resource
TimerService timerService;
2- create an action timer:
3- create a timeout function


createSingleActionTimer() will create an event for one time
createIntervalTimer() will create interval events
createCalendarTimer() will create calendar event

the Timer object in the timeout function has alot of useful methods


Persistent vs non-persistent timers
persistent timer means if the server is down the server event will be recorded and executed later.
you can define a persistent in @Schedual

@Schedule(second="0", minute="*", hour = "*", info="", persistent=true)


//////////////////////////////////////////////////////////////////////////////////////////////

Chapter 10 Web Services


to define web services you can use JAX-WS 
@WebService, @WebMethod and @WebParam


in order to define RESTFul services, you can use JAX-RS
@Path, @GET, @Produces("text/html"), @QueryParam, 

//////////////////////////////////////

Chapter 11 Packaging the EJB


1- *-ejb.jar: this jar file contains your EJBs, the deployment descriptor is ejb-jar.xml, it will be inside META-INF ( if you annotated your classes then there is no need for ejb-jar.xml ).
2- *.war: host your web application, the deployment descriptor is web.xml, it will be inside WEB-INF.
3- *.ear: put jars and wars inside it, application.xml is the deployment discriptor.
4- *.rar: this is to define resource adapters, this is something related to JAVA EE Connector Architecture, for integration; the deployment descriptor is ra.xml inside META-INF





then the chapter talks about class loading, as we know class loading is vendor specific, it is not a standard specification thing.


///////////////////////////////////////////////

Chapter 12 EJB Techniques



this chapter talks about general things, like using currency, handling exceptions, using interceptors to handle exceptions and logging ....