Integrated access to legal literature through automated semantic classification

Francesconi, E.; Peruginelli, G.

doi:10.1007/s10506-008-9072-6

Integrated access to legal literature through automated semantic classification

Published: 11 December 2008

Volume 17, pages 31–49, (2009)
Cite this article

Artificial Intelligence and Law Aims and scope Submit manuscript

E. Francesconi¹ &
G. Peruginelli¹

359 Accesses
4 Citations
Explore all metrics

Abstract

Access to legal information and, in particular, to legal literature is examined for the creation of a search and retrieval system for Italian legal literature. The design and implementation of services such as integrated access to a wide range of resources are described, with a particular focus on the importance of exploiting metadata assigned to disparate legal material. The integration of structured repositories and Web documents is the main purpose of the system: it is constructed on the basis of a federation system with service provider functions, aiming at creating a centralized index of legal resources. The index is based on a uniform metadata view created for structured data by means of the OAI approach and for Web documents by a machine learning approach, which, in this paper, has been assessed as regards document classification. Semantic searching is a major requirement for legal literature users and a solution based on the exploitation of Dublin Core metadata, as well as the use of legal ontologies and related terms prepared for accessing indexed articles have been implemented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Introducing Solon: A Semantic Platform for Managing Legal Sources

Mining and Indexing of Legal Natural Language Texts with Domain and Task Ontology

A Semantic Query Engine for Knowledge Rich Legal Digital Libraries

Notes

Legal literature consists in legal intellectual outputs published in monographs, journal articles, manuals, grey literature, proceedings, etc.
Legislation on the Net http://www.normeinrete.it.
On-line Public Access Catalogues.
The DoGi database (http://nir.ittig.cnr.it/dogiswish/Index.htm), is, in the Italian legal landscape, one of the most precious sources for legal literature research. It is a database created in 1970, offering abstracts of articles published in the most important legal periodicals (more than 250). Its main goal is to provide law scholars and professionals with exhaustive and updated information as found in Italian law reviews.
Currently a study of a publisher metadata format is under analysis, therefore the related DC mapping is not described in this paper.
Mapping Dublin Core/UNIMARC is based on tables prepared by ICCU, Rome: http://www.iccu.sbn.it/Edubluni.htm.
Open Archives Initiative (http://www.openarchives.org/OAI/openarchivesprotocol.htm).
TEL—The European Library (http://www.europeanlibrary.org).
CYCLADES—An Open Collaborative Virtual Archive Environment (http://www.ercim.org/cyclades/).
TORII—The Digital Research Community (http://library.cern.ch/HEPLW/4/papers/4/).
ARC developed by Digital Library Group, Old Dominion University.
http://www.openarchives.org.
Most of which is summarized at http://www.lub.lu.se/tk/metadata/dctoollist.html.
Such classes, organized in a single-tier set only, have been chosen to test the approach. Possible extensions or hierarchical organization of the classes can be approached respectively by re-training the classifiers according to the new set of classes or using a set of classifiers hierarchically organized as classes are organised.
We used the MSVM implementation at http://www.csie.ntu.edu.tw/~cjlin/bsvm/index.html.
Swish-e, Simple Web Indexing System for Humans—Enhanced (http://swish-e.org).
see http://nir.ittig.cnr.it/dogiswish/consistenze/class2000Eng.htm.
But similar arguments can be provided for the MBDQ modality.

References

Apps A (2003) A journal article bibliographic citation Dublin core structured value. http://epub.mimas.ac.uk/DC/citdcsv.html Retrieved 2 May 2003
Apté C, Damerau F, Weiss S (1994) Automated learning of decision rules for text categorization. ACM Trans Inf Syst 12(3):233–251
Article Google Scholar
Biagioli C, Francesconi E, Passerini A, Montemagni S, Soria C (2005) Automatic semantics extraction in law documents. In: Proceedings of international conference on artificial intelligence and law, pp 133–139
Buckley C, Salton G (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523
Article Google Scholar
Burges C (1998) A tutorial on support vector machines for pattern recognition. In: Data mining and knowledge discovery, vol 2. Kluwer Academic Publishers, Boston
Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20:1–25
Google Scholar
Crammer K, Singer Y (2002) On the algorithmic implementation of multiclass kernel-based vector machines. J Mach Learn Res 2:265–292
Article MATH Google Scholar
Dumais S, Platt J, Heckerman D, Sahami M (1998) Inductive learning algorithms and representations for text categorization. In: CIKM ’98: proceedings of the seventh international conference on Information and knowledge management, pp 148–155
Francesconi E, Peruginelli G (2004) Opening the legal literature portal to multilingual access. In: Proceedings of the Dublin core conference, pp 37–44
Francesconi E, Passerini A (2007) Automatic classification of provisions in legislative texts. Int J Artif Intell Law 15(1):1–17
Article Google Scholar
Greenberg JWR (2002) Semantic Web construction: an inquiry of authors’ views on collaborative metadata generation. In: Proceedings of the international conference on Dublin core and metadata for e-communities, pp 45–52
Hachey B, Grover C (2005) Automatic legal text summarisation: experiments with summary structuring. In: Proceedings of international conference on artificial intelligence and law, pp 75–84
Hsu C-W, Lin C-J (2002) A comparison of methods for multi-class support vector machines. IEEE Trans Neural Netw 13(2):415–425
Article Google Scholar
Joachims T (1997) A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In: Proceedings of the fourteenth international conference on machine learning, Morgan Kaufmann Publishers Inc., San Francisco, US, pp 143–151
McCallum A, Nigam K, Rennie J, Seymore K (2000) Automating the construction of internet portals with machine learning. Inf Retr J 3:127–163
Google Scholar
Moens M-F (2005) Combining structured and unstructured information in a retrieval model for accessing legislation. In: Proceedings of international conference on artificial intelligence and law, pp 141–145
Quinlan J (1986) Inductive learning of decision trees. Mach Learn 1:81–106
Google Scholar
Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47
Article Google Scholar
Vapnik V (1998) Statistical learning theory. Wiley, New York
MATH Google Scholar
Yang Y, Pedersen J (1997) A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, Morgan Kaufmann Publishers Inc., pp 412–420

Download references

Acknowledgements

Special thanks go to Dr. Anna Archi, senior researcher at ITTIG-CNR, who dedicated her research work to services for retrieving legal literature.

Author information

Authors and Affiliations

Institute of Legal Theory and Techniques, Italian National Research Council (ITTIG-CNR), Florence, Italy
E. Francesconi & G. Peruginelli

Authors

E. Francesconi
View author publications
You can also search for this author in PubMed Google Scholar
G. Peruginelli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to E. Francesconi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Francesconi, E., Peruginelli, G. Integrated access to legal literature through automated semantic classification. Artif Intell Law 17, 31–49 (2009). https://doi.org/10.1007/s10506-008-9072-6

Download citation

Received: 01 August 2008
Accepted: 01 December 2008
Published: 11 December 2008
Issue Date: March 2009
DOI: https://doi.org/10.1007/s10506-008-9072-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Integrated access to legal literature through automated semantic classification

Abstract

Access this article

Similar content being viewed by others

Introducing Solon: A Semantic Platform for Managing Legal Sources

Mining and Indexing of Legal Natural Language Texts with Domain and Task Ontology

A Semantic Query Engine for Knowledge Rich Legal Digital Libraries

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Integrated access to legal literature through automated semantic classification

Abstract

Access this article

Similar content being viewed by others

Introducing Solon: A Semantic Platform for Managing Legal Sources

Mining and Indexing of Legal Natural Language Texts with Domain and Task Ontology

A Semantic Query Engine for Knowledge Rich Legal Digital Libraries

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation