Parsing Huge Text Files Using Java or Linux commands

Published on June 19, 2010 by Amir Sedighi

First Scenario: Using Java as programming Language
A friend of mine and I decided to parse a huge size text file that consists some reports of legacy devices. After few times trying we got that, opening and parsing huge text files in Java is a pertty time and resource consuming. We started with a 35MB log file. We have never worked with such a huge size text files. So we tried to find the relevant solution. Indeed, Java is not the best solution for this kind of problems. I believe Python or Perl could perform this requirement by a higher performance. However regards to later developments and project requirements we decided to use Java. After some searching through web we found a brilliant tool. Tigris has some valueable open source projects. JSapar is one of them. JSapar is a Java library providing a schema based parser/producer of CSV (Comma Separated Values) and flat files. The goal of this project is to create a java library that contains a parser of flat files and csv files.
The file imports in to an object oriented model that we called it telegrams. The parser produces a Document class, representing the content of the file, or you can choose to receive events for each line that has been successfully parsed. Tigris claims that JSapar can handle huge files without loading everything into memory.
The library is simple to use and possible to extend. Our log file consists thousands of lines just the same as below sample line:

948853 : 47 E6 18 FF 04 CD 0B 1D B1 C1 D1 1E ;

This is a telegram. First part is row number (948853) and next bytes contains a message. This two part are separated by a “:”. At first sight it seems it is a straight forward procedure, however, it is not as much easy as it looks. Millions of these lines makes a real slow running and unstable application if you use standard java scaner or parsers. First we defined a schema for csv files:

Then we just used a simple java code to read a 40MB text file into memory in less than 10 seconds.

public final void loadTelegrams() throws SchemaException, IOException, JSaParException {
	Document telegrams;
	Reader schemaReader = new FileReader("schema/schema.xml");
	Xml2SchemaBuilder xmlBuilder = new Xml2SchemaBuilder();
	Reader fileReader = new FileReader("repo/dat.txt");
	Parser parser = new Parser(xmlBuilder.build(schemaReader));
	telegrams = parser.build(fileReader);
	fileReader.close();
}

Using below command we moves whole of the file cell by cell so quickly.

telegrams.getLine(i).getCell(j).getStringValue();

Second Scenario: Using Linux Commands
Linux has an arsenal of commands and utilities for working with large size files. Here I just refer to sed and perl.
sed is a good choice for manipulating large text files in Linux because it doesn’t read the whole file at once in memory to change it. Notice the manual:
While in some ways similar to an editor which permits scripted edits (such as ed), sed works by making only one pass over the input(s), and is consequently more efficient.

Here is a removal of the line containing the string “removeMe”:

sed '/removeMe/d' aHugeSizeText.txt

Also here is a replacement sample using perl:

perl -i -p -e 's{the_source_string_that_should_change}{the_destination_string}g' aHugeSizeFile.txt

Posted in Java, Linux

Recover MySql root password

Published on June 19, 2010 by Seroj Alaverdian

In order to change MySql root Password in linux, you need to follow these simple steps:

1.Login as root to linux
2.Stop MySql server

/etc/init.d/mysql stop

3.Start MySql safe mode with –skip-grant-tables so there will be no prompt for password

/usr/bin/mysqld_safe --skip-grant-tables

4.Now login into MySql

mysql -u root

5.Update password and flush privileges

UPDATE user SET Password=PASSWORD('newrootpassword) WHERE User='root';
FLUSH PRIVILEGES;

6.Exit and stop MySql server(safe Mode)

/etc/init.d/mysql stop

At this point your Mysql root password is succesfuly changed.
Now you can start your MySql server and login as root with your new password you’ve set.

Posted in Linux, MySQL

Loading UTF8 text data into a MySQL table

Published on June 16, 2010 by Kamran Vatanabadi

Even if you set the character set and collation of fields or table to UTF8 using simple “load data infile” command doesn’t import UTF8 texts correctly into the table. The solution is easy, you have to use “character set UTF8” after table name to instruct MySQL to consider this issue.

For example the following command will import my_data.txt information on my Desktop to my_data table:

load data local infile '/Users/kamran/Desktop/my_data.txt'
into table my_data character set UTF8 fields terminated by ',';

Note that you already have to set character set/collation of fields containing UTF8 data to “UTF8” too. Using “local” part of the command is also a must if you want to load data from your own computer, or you’ll see error #13.

Posted in MySQL

Port Mapping in Linux

Published on June 15, 2010 by Amir Sedighi

If you used to setup web servers in windows 80 port easily, you will find that Linux needs a bit customization for using 80 port. Indeed, Linux has its own secured solution to let your web servers listen to the 80 port. In fact, you can not use the ports under 1024 for the individual users. Because they belongs to the super user.
Moreover, its recommended to run web servers with a limited users that cant hurt underlying services. So, the best way to run HTTP listeners on 80 port is to map upper than 1024 ports 80 port. Some thing like NAT in IP networks world we use in port mapping. Some month ago while I was working with a JBoss application server which has ran on 8080 port I decided to relocate listener to 80. I didn’t want to change JBoss settings to do this. I make it easily by running 3 iptable commands which I found in some tutorials on the web.

sudo iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to-port 8080
sudo iptables -I INPUT -p tcp --dport 8080 -j ACCEPT
sudo iptables -t nat -A OUTPUT -o lo -p tcp --dport 80 -j REDIRECT --to-port 8080

Isn’t it easy?
NOTE: if you dont put-o looptionfor the last command, all outgoing requestes will rout to the 8080 port on the localhost.

At last you should save new rules into disk. below command persists changes.

sudo iptables-save > iptable.rules

Moreover you need to make sure rules will loaded in the next start up. So, add below line into/etc/network/interfacesafter iface command:

pre-up iptables-restore < /etc/iptable.rules

If you are using CentOS then it is enough to run below command to persist iptables rules:

/sbin/service iptables save

Regarding to the Linux flavors similarities I think this should work in other flavors with a bit modification or the exactly the same as our steps. Just try it and keep your app server secured on 80 port.

Posted in Linux

Software Graduate Students in IRAN

Published on June 14, 2010 by Kamran Vatanabadi

We have hundreds of software graduate students each year in IRAN. Universities offer many engineering degrees like computer science, software engineering, computer architecture, IT engineering and … Although there are some differences between the students of different universities in aspect of university rank, but sharp and up to date students usually have no problem in finding proper job for themselves. Here are some issues I have seen during my work experience in more than 20 years.

Graduated students usually have no work experience
One of the problems with new graduated students is they usually have no work experience, except those who have worked during their formal study. If you hire a new unskilled but enough sharp graduated, you have to pay the price of their training. It usually takes 2 or 3 months for these kind of employee to start being useful but if you want to know how much you can trust them at work, you have to wait and examine them in different aspects of work, say a year.

Academic education is not always necessary
If someone has enough knowledge and work experience, even with no academic education he or she has many opportunities to work in private companies in IRAN. Private companies usually look the benefit and returns of their investment so they don’t have problem with hiring employees who have required skills but no university degree. I’ve seen and worked with many non academic educated people who were very sharp and active while many university graduated were just an ordinary engineer.

The problem with working in government related organizations or companies is that they usually want academic education degree for hiring. Many students who don’t want to have a serious job, usually look for such job vacancies. Easy work, low responsibility and high level of job security is what these people want. So most of them usually have no chance to be employed by private sector.

But there is still some small differences
I’m not going to say this as a rule for all of the non academic educated people, but most of them usually have some lack of knowledge in theoretical issues of software engineering. In fact most of them can be a very good programer better than formal educated ones but not a good conceptual designer or manager. Academic studies usually teach people how to have engineering mind to solve the problem or to think to solve problem, while personal experience usually gives people using tools skills rather than solving problems approach. These kind of people can usually solve the problem they have previous experience on them very well, but not the new ones necessarily.

Some of them also don’t have required engineering intelligence to work on a project from A to Z. This is why you usually have to hire mix combination of formal and self educated people to overcome your required skills.

Lack of programming skills
Most of the graduated students in IRAN don’t have required programming skills. This is where you have to invest your time and money to turn them to good programmers. They mostly learn Java and C, the OOP foundations, many software production methodologies but still don’t have any idea on how to use these knowledges, or how to start a project even small one. I don’t know if this is one of our country’s problem with university graduated students or not, but this is what we face it in IRAN.

Do not give simple but sensitive and important task to this people!
Yes, never give even a simple file copy for example on a live server to these people, these young and low skilled employee don’t understand the meaning of live data and have no perception of what responsibility or live data is! In fact working with live or sensitive systems required some self assurance which is only achievable by continuous working on live systems for many years. Even experienced and skilled employees can make strange mistaks on live systems, so don’t risk, do this simple tasks yourself if you don’t have such colleague.

Posted in Software Engineering

OSS Architecture (OSS Story)

Published on June 13, 2010 by Kamran Vatanabadi

We talked about the architecture of the legacy system in our ”Design of the legacy CRM” post and saw the problems our customer used to deal with just because of the system’s architecture and platforms. We decided to use the following architecture, let us have a look at it:

Using web application model for businesses, specially for those who wants to give better services to their customers via web or have some branches or agents in other geographical places is necessary today. In fact i’m asking, what other better model you can suggest for an ISP, While all customers are web-enabled and nothing is better than this for a company to have all of his customers on web.

Tomcat as Servlet Container
We started to work with Tomcat and still think it is a good choice for a servlet container. I’ll talk about the frame work we used to build the OSS application later but for now just I say, we are happy and satisfied using servlet and JSP to build the OSS application. Honestly we have faced no critical problem yet, while the system at this time have around 100 online user (I mean employees) and many other agents and customers working with the system via ISP’s website.

MySQL as database
There is no doubt that MySQL is one of the fastes databases in the database market which specially fits with web applications. At the moment the OSS’s databases totally consists of around 10 Million records and it works as fast as first days we had only 10 thousand records. Although we have used many useful strategies and methods to store data and keep it light, but when I think about the days we worked with ORACLE or SQLServer and the problems we had with their maintenance, I totally believe in MySQL, it never brought us any serious problem.

How does it work?
All users use just a web browser to work with the application, the company’s employees via LAN and the agents (and customers) via the Internet with or without VPN. Note that in this architecture we don’t have Network Elements access race or problem same as the legacy system. The single OSS application on Tomcat manages access to each Network Elements so things work good in this model.

Posted in OSS

Creating Object In Javascript

Published on June 10, 2010 by Amir Sedighi

Javascript makes object instantiation much easier than other languages in somehow. If you need to have just an object then you can define it directly without defining a class. Here is a sample of declaring an object without any class definition:

var myBook = {
    title : "Little Prince",
    price: 10,
    getAQoute: function() {
    alert("'What makes the desert beautiful is that somewhere it hides a well.'");
  }
}

myBook.getAQoute();  // calling a method.

This works when you need to create a single object. Moreover you can create object as other regular languages:

// Book class
function Book() {
  this.price = 0;
  this.title = "";
  this.getIntro = function() {
    alert("Title:" + this.title + " , Price:" + this.price);
  }
}

// Instantiating a instance of Book class
var aBook = new Book();
aBook.price = 20;
aBook.title = "Little Prince"

// Instantiating another instance of Book class
var anotherBook = new Book();
anotherBook.price = 10;
anotherBook.title = "Animal Farm"

aBook.getIntro();
anotherBook.getIntro();

Posted in JavaScript

How to be a successful investor in software development field – Part 3

Published on June 9, 2010 by Amir Sedighi

Is the project in your profession?
Software projects always can be fitted completely or partially into below categories:

A. The projects that are in your profession and you know exactly how you should perform them.
B. The projects that aren’t in your profession and you don’t know exactly how you should perform them.

It could be better to outsource type B projects to someone which can handle the project as type A. In this case you ojust need to provide detail and clear documents that consists commitments and scheduling. But in some cases you need to outsource a project of type A. Lack of resources, timing constraints and scheduling may causes the situation more proper for outsourcing. In this case you know how to do the job. For outsourcing this kind of software projects you always like to manage the job remotely during project. Preparation of a configuration management will be useful. Effective using of source control, continues integration and day to day task management put you in safe side.

Remotely Management tools and techniques
A combination of source control, continues integration and task management helps you to keep align the project with it’s goals. Using this arrangement code remains clean and bug free. Unit testing and test passing ensure that it the code is reliable. The tests run automatically by continues integration so the notification process doesn’t need any special activity except initiating configuration. Using a day to day task management that is enabled to make a linkage with committed source within meaning full comments helps you to track the chain of requirement, assigned task and development or bug fixing easily. Using of HTTP based services helps you to let remote team members share the matters more easy.

Wiki
Wiki knows a lot of tips and tricks if you shared your knowledge with it. Use wiki whenever you find a procedure or you change a part of code or documentations during the project. Attach artifacts, put links, write notes and be sure you never regret. Assume how it could be better to let a new member find out his job specification, past accomplished parts, remain parts and the needed knowledge just by sending an email that consists some hyper links. Add chat’s script to the wiki to let others find out what has done during project. Practical using of wiki will require a different mindset. It might seem difficult and boring at first, but the payoff will be well worth the effort.

Conclusion
In software market there are a lot of opportunities for software application developer companies and investors yet. These opportunities are similar to the hidden gold mines. Finding the gold mine is depends on the approach that you have been selected to develop the application. Outsourcing is a pretty successful method if you know how which part of development should be developed within that. This is depends on the project definition, your profession and potentials. We explained some simple techniques to reduce the risk of development and outsourcing both.

Posted in Software Engineering

OutOfMemoryError: PermGen

Published on June 8, 2010 by Kamran Vatanabadi

“OutOfMemoryError: PermGen” or simply “PermGen” is familiar Tomcat (or any Tomcat base app server like JBoss and …) error message for those who has large and/or long running application on Tomcat or do deploy/undeploy many applications on Tomcat. Sorry but if you haven’t seen this message on Tomcat’s console before, you are still a beginner or your application is small! It is not just something happens in Tomcat’s log file, if it happens, the system will crash and won’t response to requests anymore, then you have to restart Tomcat and you may loose your data!

Ok, why does it happen? It happens because of memory leakage on special part related to GC or garbage collector. The default size of the the PermGen area is 64M and if there are some memory leaking problem in releasing objects during the application’s normal work or application deploy/undeploy process, this area will lose memory space and after a while there won’t be enough space in this area to handle your application.

Many believes that changing JVM vendor will solve the problem (from Sun to …), but when we faced this problem we actually didn’t use Sun JVM, we did have diablo-1.5 on FreeBSD 7.1 in our live server. At first we started to restart tomcat after each re-deploying the application. This solution worked more or less because by restarting Tomcat you’ll naturally set the size of this area to 64M again. But if your application stays alive for long time without restarting, you’ll face the problem again, as we faced.

The problem exist, and you have to find a way to deal with it. The simplest way is to increase the allocated memory to this part. Here is the command to increase it to guaranteed 128M, I mean both min and max are equal to 128M:

-XX:PermSize=128m -XX:MaxPermSize=128m

You have to pass this command to JVM as it wants to run your application and in our model, since Tomcat runs our web-application we had to pass it to Tomcat. We put it in “rc.conf” after other Tomcat’s setting as bellow:

tomcat55_enable="YES"
tomcat55_java_opts="-Xms2048m -Xmx2048m -XX:PermSize=128m -XX:MaxPermSize=128m"

The two first setting are for JVM’s minimum and maximum heap size, and the rest is for PermGen heap. But there is still something important I have to add, you have to find the best amount for PermGen heap size by try and error. Giving 256M for this heap doesn’t necessary solves your problem, it may also reduces the total performance of your application. In our experience setting the size to 256M made the application very slow, but 128M not only didn’t have any side effect on application’s performance, also solved the problem for ever.

Posted in Java

Simple Cocoa Status Menu

Published on June 7, 2010 by Kamran Vatanabadi

It is more than 15 years that I haven’t written code for windows, I remember it was difficult and so tricky to move an application in the tray, but here in Mac OS X, you can easily send your application in status menu. You can download sample status menu from here, now let us take a look at the key parts of this work.

First of all you have to insert a menu from the object library to your application, this menu is going to appear as you click on your status menu. The key variable for status menu is NSStatusItem, this is the variable which actually makes the status menu. The definitions in header file is as bellow:

@interface SimpleStatusMenuAppDelegate : NSObject  {
    NSWindow *window;
    IBOutlet NSMenu *statusMenu;
    NSStatusItem *statusItem;
}

- (IBAction)doBeep:(id)sender;
- (IBAction)doOpenGoogle:(id)sender;

@property (assign) IBOutlet NSWindow *window;

@end

As you see we have two actions which are going to support our menu items, “Beep” and “Open google.com”. These items are added to the menu with interface builder. Now as we already have talked in our previous posts, we initialize most of our variables in awakeFromNib. Calling NSStatusBar’s systemStatusBar will return a pointer to your status menu as bellow:

statusItem = [[[NSStatusBar systemStatusBar]
      statusItemWithLength:NSVariableStatusItemLength] retain];
[statusItem setMenu:statusMenu];
[statusItem setTitle:@"myFavorites"];
[statusItem setHighlightMode:YES];

In the first line we have also asked NSStatusBar to give us a variable length kind of status menu. The second line attaches the menu to the status menu. The next sets its title and the last line makes status menu highlight as you click to open its menu. If you pass NO, by clicking on the status menu it only opens the menu without getting highlight.

There is also one more thing you have to do! You need to add a “Key” of “Application is agent (UIElement)” to -info.plist file with a checked value. This makes your application not stay in dock.

download sample status menu application

Posted in Cocoa/Objective-C

« Older Entries
Newer Entries »

Categories

Archives