Wednesday, March 5, 2008

An HTTP Client

Problem: www.nse-india.com provides the historical data for shares traded. Company price data for any day starting from 1987 November can be accessed by filling a set of online forms through a browser.

e.g. the url for Jan 22 2007 will be:

http://www.nse-india.com/content/historical/WDM/2007/JAN/wdmlist_22012007.csv

Now, download the data for 10 years automatically and store it into a MS Access database.

My Solution:

1. Create a module to dynamically generate the urls for each day in the 10 year period.
2. Use Apache's HttpClient to submit the request at the generated URL: http://hc.apache.org/downloads.cgi
3. Create a module to parse the reponse String into Beans. Use Apache DateUtils.
4. Use JDBC-ODBC connectivity to connect the java code to MS access
5. create a DAO module with java.sql.PreparedStatement to insert the beans into an MS-Access table.

Bugs encountered:

1. Compiler error: Apache HttpClient classes not found.
Fix: Move jars to the folder /jdk/jre/lib/ext. No need to touch the classpath if the jars go here.
2. Class not found error: HttpClient classes don't get loaded at runtime.
Fix: Move the path entry for the JDK to the beginning of the path string. Cause of the error is that there are several JREs in the system and the java command is running from a JRE which is not inside the JDK which ran the javac.
3. No errors or exceptions, but the data doesn't load into the database:
Fix: close the PreparedStatement after use/insertion.

No comments: