Storing database data in files?

Question!

I'm currently working on a school project, in java, and I'm coding a database application. Something like the MySQL Monitor where you type in queries and get results / whatever.

In applications I've coded before, I used databases to store data, like user profiles, settings, etc. Now, obviously, I can't use a database to store data generated from this school project, otherwise what's the point?

I'm thinking about storing the data in files but that's the only idea I have in my mind right now and I'm kinda running dry.. and to be honest, I don't want to start banging at code and then I discover a better way of doing it.

So if anyone has any idea how to store the data (like CSV?), or has some kind of knowledge of how database applications work internally, can you please shed some light?

-- EDIT: just to be more clear, I can't use database engines to store the data, to put it this way, I'm coding a simple database engine. Ideas like what Galwegian, jkramer and Joe Skora suggested is what I'm looking for.



Answers

The basics of storing records in blocks in data files have been around for decades. Obviously there are a great many variations on a theme, and all of them are designed to work around the fact that we have slow disk drives.

But the fundamentals are not difficult. Combining fixed length columns with a fixed number of columns can give you very rapid access to any record in your database.

From there, it's all offsets.

Let's take the example of a simple row containing 10 32-Bit integers. A single row would be 40 bytes (4 bytes per integer * 10). If you want row 123, simply multiply it by 40. 123 * 40, gives you an offset of 4920. Seek that far in to the database file, read 40 bytes, and voila, you have a row from your database.

Indexes are stored in B+-Trees, with tree nodes distributed across blocks on the disk. The power of the B+Tree is that you can easily find a single key value within the tree, and then simply walk the leaf nodes to scroll through the data in key order.

For a simple format that's useful and popular, consider looking up the original DBase format -- DBF Files. It's evolved some over the years, but the foundation is quite simple, well documented, and there are lots of utilities that can work on it. It's a perfectly workable database format that deals with all of the fundamental issues with the problem.



StackOverflow isn't for homework.

Having said that, here's the Quick and Dirty way to an efficient, flexible database.

  1. Design a nice Map (HashMap, TreeMap, whatever) that does what you want to do. Often, you'll have a "Record" class with your data, and a number of "Index" objects which are effectively Map<String,List<Record>> collections. (Why a list of records? What about an index on a not-very-selective field?)

  2. Write a class to serialize your collections into files.

  3. Write a class to deserialize your collections from files.

  4. Write your query processing or whatever around the in-memory Java objects.

In-memory database.

Don't like Java's serialization? Get a JSON or YAML library and use those formats to serialize and deserialize.

"But an in-memory database won't scale," the purists whine. Take that up with SQLite, not me. My PC has 2GB of RAM, that's a pretty big database. SQLite works.

By : S.Lott


I would create a database that uses binary tables, one file per table. Take a look at the very handy DataInputStream and DataOutputStream classes. Using them you can easily go back and forth from binary files to Java types.

I would define a simple structure for the table: a header that describes the contents of the table, followed by the row data. Have each column in the table defined in the header - its name, data type, and maximum length. Keep it simple. Only handle a few data types using the capabilities of DataInput/OutputStream as your guide. Use a simple file-naming convention to associate table names to file names.

Create a test table with enough columns to have at least one of each data type. Then, create a simple way to populate tables with data, either by processing input files or via console input. Finally, create a simple way to display the contents of entire tables to the console.

After that, you can add on a very simple version of a SQL-like dialect to do queries. A simple query like this:

SELECT * FROM EMPLOYEES

...would require opening up the file containing the EMPLOYEES table (via your table filename naming convention), parsing the header, and reading through the entire table, returning the contents.

After you get that working, it will be simple to add other functionality such as processing of simple WHERE clauses, returning only the rows (or columns within rows) that match certain criteria.

If it's not necessary to have such a general-purpose solution (any number of tables, any number of columns, an actual query language, etc.) you can simply add methods to your API like:

Employee[] result = EmployeeDataManager.select("LASTNAME", "Smith");

...or something like that. If you build up slowly, dividing your functionality up into several small tasks as I have suggested, soon you will have implemented all of the features you need.

By : shadit


This video can help you solving your question :)
By: admin