com.knowledgebooks
Class API

java.lang.Object
  extended by com.knowledgebooks.API

public class API
extends java.lang.Object

The API class is a facade wrapper class for the entire B_bundle product. It is inteneded to expose most of the functionality of the KB_bundle system with a minimal and easy to learn set of APIs.

The home page for the KB_bundle product is http://knowledgebooks.com

Copyright 2002-2008 by Mark Watson. All rights reserved.

This software is not public domain. It can be legally used under either of the following licenses:

1. KnowledgeBooks.com Non Commercial Royality Free License
2. KnowledgeBooks.com Commercial Use License

see www.knowledgebooks.com for details


Constructor Summary
API()
          Default constructor stores persistent data in ./temp_data_dir
API(java.lang.String top_data_dir_path, boolean initialize_all_data)
          Constructor stores persistent data in top_data_dir_path
 
Method Summary
 java.lang.String abiwordToPlainText(java.lang.String s)
           
 boolean addInfoSource(java.lang.String uri)
          Add a text information resource to local data stores.
 void close()
          Close database, triple store, and Lucene indices.
 java.lang.String doSparqlQuery(java.lang.String sparql)
          All information resources added to the system are processed to generate RDF data that is automatically loaded into a local Sesame RDF repository.
 void exportAllDatabaseTables(java.lang.String output_dir_path)
          Write out all local em=bedded relational daabase data, one file per table, to the specified directory path.
 void exportAllRdfAsN3(java.lang.String output_file_path)
          Write out all local RDF data to a file in the N3 format.
 void exportAllRdfAsTriples(java.lang.String output_file_path)
          Write out all local RDF data to a file in the NTriples format.
 java.lang.String htmlToPlainText(java.lang.String s)
           
static void main(java.lang.String[] args)
          Main method for using KB_bundle as a command line utility program
 java.lang.String openofficeToPlainText(java.lang.String s)
           
 java.lang.String pdfToPlainText(java.lang.String s)
           
 java.lang.String powerpointToPlainText(java.lang.String s)
           
 java.util.List<java.lang.String> searchAllTextForUris(java.lang.String lucene_query_string)
          All information resources added to the system are indexed using Lucene.
 java.util.List<java.lang.String[]> searchAllTextForUrisAndMatchedText(java.lang.String lucene_query_string)
          All information resources added to the system are indexed using Lucene.
 java.util.List<java.lang.String> tag(java.util.ArrayList<java.lang.String> tokens)
           
 java.util.List<java.lang.String> tokenize(java.lang.String text)
           
 java.lang.String wordToPlainText(java.lang.String s)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

API

public API()
    throws java.lang.Exception
Default constructor stores persistent data in ./temp_data_dir

Throws:
java.lang.Exception

API

public API(java.lang.String top_data_dir_path,
           boolean initialize_all_data)
    throws java.lang.Exception
Constructor stores persistent data in top_data_dir_path

Parameters:
top_data_dir_path - changes default location for Sesame RDF, Lucene, and embedded database files
initialize_all_data - boolean flag for discarding all previous Sesame RDF, Lucene, and embedded database data and re-initialize
Throws:
java.lang.Exception
Method Detail

close

public void close()
Close database, triple store, and Lucene indices.


addInfoSource

public boolean addInfoSource(java.lang.String uri)
Add a text information resource to local data stores. Entity extraction is performed, summarization, and similarity clustering is done as information is added.

Parameters:
uri - the URI can be either on the local file syste or on the web
Returns:
return boolean true is resource added OK to Sesame RDF, Lucene, and embedded database

searchAllTextForUris

public java.util.List<java.lang.String> searchAllTextForUris(java.lang.String lucene_query_string)
                                                      throws java.lang.Exception
All information resources added to the system are indexed using Lucene. This utility method performs search for original information source URIs using Lucene's default syntax.

Parameters:
lucene_query_string -
Returns:
a list of strings that are information source URIs
Throws:
java.lang.Exception

searchAllTextForUrisAndMatchedText

public java.util.List<java.lang.String[]> searchAllTextForUrisAndMatchedText(java.lang.String lucene_query_string)
                                                                      throws java.lang.Exception
All information resources added to the system are indexed using Lucene. This utility method performs search for both original source URIs and text using Lucene's default syntax.

Parameters:
lucene_query_string -
Returns:
a list of strings[] that are pairs: information source URIs and text
Throws:
java.lang.Exception

doSparqlQuery

public java.lang.String doSparqlQuery(java.lang.String sparql)
                               throws java.lang.Exception
All information resources added to the system are processed to generate RDF data that is automatically loaded into a local Sesame RDF repository. This utility method is used to execute SPARQL queries against this RDF data store.

Parameters:
sparql -
Returns:
Throws:
java.lang.Exception

exportAllRdfAsN3

public void exportAllRdfAsN3(java.lang.String output_file_path)
Write out all local RDF data to a file in the N3 format.

Parameters:
output_file_path -

exportAllRdfAsTriples

public void exportAllRdfAsTriples(java.lang.String output_file_path)
Write out all local RDF data to a file in the NTriples format.

Parameters:
output_file_path -

exportAllDatabaseTables

public void exportAllDatabaseTables(java.lang.String output_dir_path)
Write out all local em=bedded relational daabase data, one file per table, to the specified directory path.

Parameters:
output_dir_path - top root file directory path to contain individual files (one per table)

tokenize

public java.util.List<java.lang.String> tokenize(java.lang.String text)

tag

public java.util.List<java.lang.String> tag(java.util.ArrayList<java.lang.String> tokens)

wordToPlainText

public java.lang.String wordToPlainText(java.lang.String s)

pdfToPlainText

public java.lang.String pdfToPlainText(java.lang.String s)

htmlToPlainText

public java.lang.String htmlToPlainText(java.lang.String s)

openofficeToPlainText

public java.lang.String openofficeToPlainText(java.lang.String s)

powerpointToPlainText

public java.lang.String powerpointToPlainText(java.lang.String s)

abiwordToPlainText

public java.lang.String abiwordToPlainText(java.lang.String s)

main

public static void main(java.lang.String[] args)
Main method for using KB_bundle as a command line utility program
 Command line options:
 
   -text_2_rdf input text file name output N3 format RDF file
   -summarize input text file name output summary text file
 

Parameters:
args - the command line arguments