Using data analysis and Information visualization techniques to support the effective analysis of large financial data sets
- Authors: Nyumbeka, Dumisani Joshua
- Date: 2016
- Subjects: Information visualization Finance -- Mathematical models , Database management
- Language: English
- Type: Thesis , Masters , MCom
- Identifier: http://hdl.handle.net/10948/12983 , vital:27141
- Description: There have been a number of technological advances in the last ten years, which has resulted in the amount of data generated in organisations increasing by more than 200% during this period. This rapid increase in data means that if financial institutions are to derive significant value from this data, they need to identify new ways to analyse this data effectively. Due to the considerable size of the data, financial institutions also need to consider how to effectively visualise the data. Traditional tools such as relational database management systems have problems processing large amounts of data due to memory constraints, latency issues and the presence of both structured and unstructured data The aim of this research was to use data analysis and information visualisation techniques (IV) to support the effective analysis of large financial data sets. In order to visually analyse the data effectively, the underlying data model must produce results that are reliable. A large financial data set was identified, and used to demonstrate that IV techniques can be used to support the effective analysis of large financial data sets. A review of the literature on large financial data sets, visual analytics, existing data management and data visualisation tools identified the shortcomings of existing tools. This resulted in the determination of the requirements for the data management tool, and the IV tool. The data management tool identified was a data warehouse and the IV toolkit identified was Tableau. The IV techniques identified included the Overview, Dashboards and Colour Blending. The IV tool was implemented and published online and can be accessed through a web browser interface. The data warehouse and the IV tool were evaluated to determine their accuracy and effectiveness in supporting the effective analysis of the large financial data set. The experiment used to evaluate the data warehouse yielded positive results, showing that only about 4% of the records had incorrect data. The results of the user study were positive and no major usability issues were identified. The participants found the IV techniques effective for analysing the large financial data set.
- Full Text:
- Date Issued: 2016
- Authors: Nyumbeka, Dumisani Joshua
- Date: 2016
- Subjects: Information visualization Finance -- Mathematical models , Database management
- Language: English
- Type: Thesis , Masters , MCom
- Identifier: http://hdl.handle.net/10948/12983 , vital:27141
- Description: There have been a number of technological advances in the last ten years, which has resulted in the amount of data generated in organisations increasing by more than 200% during this period. This rapid increase in data means that if financial institutions are to derive significant value from this data, they need to identify new ways to analyse this data effectively. Due to the considerable size of the data, financial institutions also need to consider how to effectively visualise the data. Traditional tools such as relational database management systems have problems processing large amounts of data due to memory constraints, latency issues and the presence of both structured and unstructured data The aim of this research was to use data analysis and information visualisation techniques (IV) to support the effective analysis of large financial data sets. In order to visually analyse the data effectively, the underlying data model must produce results that are reliable. A large financial data set was identified, and used to demonstrate that IV techniques can be used to support the effective analysis of large financial data sets. A review of the literature on large financial data sets, visual analytics, existing data management and data visualisation tools identified the shortcomings of existing tools. This resulted in the determination of the requirements for the data management tool, and the IV tool. The data management tool identified was a data warehouse and the IV toolkit identified was Tableau. The IV techniques identified included the Overview, Dashboards and Colour Blending. The IV tool was implemented and published online and can be accessed through a web browser interface. The data warehouse and the IV tool were evaluated to determine their accuracy and effectiveness in supporting the effective analysis of the large financial data set. The experiment used to evaluate the data warehouse yielded positive results, showing that only about 4% of the records had incorrect data. The results of the user study were positive and no major usability issues were identified. The participants found the IV techniques effective for analysing the large financial data set.
- Full Text:
- Date Issued: 2016
The impact of domain knowledge-driven variable derivation on classifier performance for corporate data mining
- Authors: Welcker, Laura Joana Maria
- Date: 2015
- Subjects: Data mining , Business -- Data processing , Database management
- Language: English
- Type: Thesis , Doctoral , DPhil
- Identifier: http://hdl.handle.net/10948/5009 , vital:20778
- Description: The technological progress in terms of increasing computational power and growing virtual space to collect data offers great potential for businesses to benefit from data mining applications. Data mining can create a competitive advantage for corporations by discovering business relevant information, such as patterns, relationships, and rules. The role of the human user within the data mining process is crucial, which is why the research area of domain knowledge becomes increasingly important. This thesis investigates the impact of domain knowledge-driven variable derivation on classifier performance for corporate data mining. Domain knowledge is defined as methodological, data and business know-how. The thesis investigates the topic from a new perspective by shifting the focus from a one-sided approach, namely a purely analytic or purely theoretical approach towards a target group-oriented (researcher and practitioner) approach which puts the methodological aspect by means of a scientific guideline in the centre of the research. In order to ensure feasibility and practical relevance of the guideline, it is adapted and applied to the requirements of a practical business case. Thus, the thesis examines the topic from both perspectives, a theoretical and practical perspective. Therewith, it overcomes the limitation of a one-sided approach which mostly lacks practical relevance or generalisability of the results. The primary objective of this thesis is to provide a scientific guideline which should enable both practitioners and researchers to move forward the domain knowledge-driven research for variable derivation on a corporate basis. In the theoretical part, a broad overview of the main aspects which are necessary to undertake the research are given, such as the concept of domain knowledge, the data mining task of classification, variable derivation as a subtask of data preparation, and evaluation techniques. This part of the thesis refers to the methodological aspect of domain knowledge. In the practical part, a research design is developed for testing six hypotheses related to domain knowledge-driven variable derivation. The major contribution of the empirical study is concerned with testing the impact of domain knowledge on a real business data set compared to the impact of a standard and randomly derived data set. The business application of the research is a binary classification problem in the domain of an insurance business, which deals with the prediction of damages in legal expenses insurances. Domain knowledge is expressed through deriving the corporate variables by means of the business and data-driven constructive induction strategy. Six variable derivation steps are investigated: normalisation, instance relation, discretisation, categorical encoding, ratio, and multivariate mathematical function. The impact of the domain knowledge is examined by pairwise (with and without derived variables) performance comparisons for five classification techniques (decision trees, naive Bayes, logistic regression, artificial neural networks, k-nearest neighbours). The impact is measured by two classifier performance criteria: sensitivity and area under the ROC-curve (AUC). The McNemar significance test is used to verify the results. Based on the results, two hypotheses are clearly verified and accepted, three hypotheses are partly verified, and one hypothesis had to be rejected on the basis of the case study results. The thesis reveals a significant positive impact of domain knowledge-driven variable derivation on classifier performance for options of all six tested steps. Furthermore, the findings indicate that the classification technique influences the impact of the variable derivation steps, and the bundling of steps has a significant higher performance impact if the variables are derived by using domain knowledge (compared to a non-knowledge application). Finally, the research turns out that an empirical examination of the domain knowledge impact is very complex due to a high level of interaction between the selected research parameters (variable derivation step, classification technique, and performance criteria).
- Full Text:
- Date Issued: 2015
- Authors: Welcker, Laura Joana Maria
- Date: 2015
- Subjects: Data mining , Business -- Data processing , Database management
- Language: English
- Type: Thesis , Doctoral , DPhil
- Identifier: http://hdl.handle.net/10948/5009 , vital:20778
- Description: The technological progress in terms of increasing computational power and growing virtual space to collect data offers great potential for businesses to benefit from data mining applications. Data mining can create a competitive advantage for corporations by discovering business relevant information, such as patterns, relationships, and rules. The role of the human user within the data mining process is crucial, which is why the research area of domain knowledge becomes increasingly important. This thesis investigates the impact of domain knowledge-driven variable derivation on classifier performance for corporate data mining. Domain knowledge is defined as methodological, data and business know-how. The thesis investigates the topic from a new perspective by shifting the focus from a one-sided approach, namely a purely analytic or purely theoretical approach towards a target group-oriented (researcher and practitioner) approach which puts the methodological aspect by means of a scientific guideline in the centre of the research. In order to ensure feasibility and practical relevance of the guideline, it is adapted and applied to the requirements of a practical business case. Thus, the thesis examines the topic from both perspectives, a theoretical and practical perspective. Therewith, it overcomes the limitation of a one-sided approach which mostly lacks practical relevance or generalisability of the results. The primary objective of this thesis is to provide a scientific guideline which should enable both practitioners and researchers to move forward the domain knowledge-driven research for variable derivation on a corporate basis. In the theoretical part, a broad overview of the main aspects which are necessary to undertake the research are given, such as the concept of domain knowledge, the data mining task of classification, variable derivation as a subtask of data preparation, and evaluation techniques. This part of the thesis refers to the methodological aspect of domain knowledge. In the practical part, a research design is developed for testing six hypotheses related to domain knowledge-driven variable derivation. The major contribution of the empirical study is concerned with testing the impact of domain knowledge on a real business data set compared to the impact of a standard and randomly derived data set. The business application of the research is a binary classification problem in the domain of an insurance business, which deals with the prediction of damages in legal expenses insurances. Domain knowledge is expressed through deriving the corporate variables by means of the business and data-driven constructive induction strategy. Six variable derivation steps are investigated: normalisation, instance relation, discretisation, categorical encoding, ratio, and multivariate mathematical function. The impact of the domain knowledge is examined by pairwise (with and without derived variables) performance comparisons for five classification techniques (decision trees, naive Bayes, logistic regression, artificial neural networks, k-nearest neighbours). The impact is measured by two classifier performance criteria: sensitivity and area under the ROC-curve (AUC). The McNemar significance test is used to verify the results. Based on the results, two hypotheses are clearly verified and accepted, three hypotheses are partly verified, and one hypothesis had to be rejected on the basis of the case study results. The thesis reveals a significant positive impact of domain knowledge-driven variable derivation on classifier performance for options of all six tested steps. Furthermore, the findings indicate that the classification technique influences the impact of the variable derivation steps, and the bundling of steps has a significant higher performance impact if the variables are derived by using domain knowledge (compared to a non-knowledge application). Finally, the research turns out that an empirical examination of the domain knowledge impact is very complex due to a high level of interaction between the selected research parameters (variable derivation step, classification technique, and performance criteria).
- Full Text:
- Date Issued: 2015
Enhanced visualisation techniques to support access to personal information across multiple devices
- Authors: Beets, Simone Yvonne
- Date: 2014
- Subjects: Information visualisation , Database management , Web services , Personal information management
- Language: English
- Type: Thesis , Doctoral , PhD
- Identifier: vital:10500 , http://hdl.handle.net/10948/d1021136
- Description: The increasing number of devices owned by a single user makes it increasingly difficult to access, organise and visualise personal information (PI), i.e. documents and media, across these devices. The primary method that is currently used to organise and visualise PI is the hierarchical folder structure, which is a familiar and widely used means to manage PI. However, this hierarchy does not effectively support personal information management (PIM) across multiple devices. Current solutions, such as the Personal Information Dashboard and Stuff I’ve Seen, do not support PIM across multiple devices. Alternative PIM tools, such as Dropbox and TeamViewer, attempt to provide a means of accessing PI across multiple devices, but these solutions also suffer from several limitations. The aim of this research was to investigate to what extent enhanced information visualisation (IV) techniques could be used to support accessing PI across multiple devices. An interview study was conducted to identify how PI is currently managed across multiple devices. This interview study further motivated the need for a tool to support visualising PI across multiple devices and identified requirements for such an IV tool. Several suitable IV techniques were selected and enhanced to support PIM across multiple devices. These techniques comprised an Overview using a nested circles layout, a Tag Cloud and a Partition Layout, which used a novel set-based technique. A prototype, called MyPSI, was designed and implemented incorporating these enhanced IV techniques. The requirements and design of the MyPSI prototype were validated using a conceptual walkthrough. The design of the MyPSI prototype was initially implemented for a desktop or laptop device with mouse-based interaction. A sample personal space of information (PSI) was used to evaluate the prototype in a controlled user study. The user study was used to identify any usability problems with the MyPSI prototype. The results were highly positive and the participants agreed that such a tool could be useful in future. No major problems were identified with the prototype. The MyPSI prototype was then implemented on a mobile device, specifically an Android tablet device, using a similar design, but supporting touch-based interaction. Users were allowed to upload their own PSI using Dropbox, which was visualised by the MyPSI prototype. A field study was conducted following the Multi-dimensional In-depth Long-term Case Studies approach specifically designed for IV evaluation. The field study was conducted over a two-week period, evaluating both the desktop and mobile versions of the MyPSI prototype. Both versions received positive results, but the desktop version was slightly preferred over the mobile version, mainly due to familiarity and problems experienced with the mobile implementation. Design recommendations were derived to inform future designs of IV tools to support accessing PI across multiple devices. This research has shown that IV techniques can be enhanced to effectively support accessing PI across multiple devices. Future work will involve customising the MyPSI prototype for mobile phones and supporting additional platforms.
- Full Text:
- Date Issued: 2014
- Authors: Beets, Simone Yvonne
- Date: 2014
- Subjects: Information visualisation , Database management , Web services , Personal information management
- Language: English
- Type: Thesis , Doctoral , PhD
- Identifier: vital:10500 , http://hdl.handle.net/10948/d1021136
- Description: The increasing number of devices owned by a single user makes it increasingly difficult to access, organise and visualise personal information (PI), i.e. documents and media, across these devices. The primary method that is currently used to organise and visualise PI is the hierarchical folder structure, which is a familiar and widely used means to manage PI. However, this hierarchy does not effectively support personal information management (PIM) across multiple devices. Current solutions, such as the Personal Information Dashboard and Stuff I’ve Seen, do not support PIM across multiple devices. Alternative PIM tools, such as Dropbox and TeamViewer, attempt to provide a means of accessing PI across multiple devices, but these solutions also suffer from several limitations. The aim of this research was to investigate to what extent enhanced information visualisation (IV) techniques could be used to support accessing PI across multiple devices. An interview study was conducted to identify how PI is currently managed across multiple devices. This interview study further motivated the need for a tool to support visualising PI across multiple devices and identified requirements for such an IV tool. Several suitable IV techniques were selected and enhanced to support PIM across multiple devices. These techniques comprised an Overview using a nested circles layout, a Tag Cloud and a Partition Layout, which used a novel set-based technique. A prototype, called MyPSI, was designed and implemented incorporating these enhanced IV techniques. The requirements and design of the MyPSI prototype were validated using a conceptual walkthrough. The design of the MyPSI prototype was initially implemented for a desktop or laptop device with mouse-based interaction. A sample personal space of information (PSI) was used to evaluate the prototype in a controlled user study. The user study was used to identify any usability problems with the MyPSI prototype. The results were highly positive and the participants agreed that such a tool could be useful in future. No major problems were identified with the prototype. The MyPSI prototype was then implemented on a mobile device, specifically an Android tablet device, using a similar design, but supporting touch-based interaction. Users were allowed to upload their own PSI using Dropbox, which was visualised by the MyPSI prototype. A field study was conducted following the Multi-dimensional In-depth Long-term Case Studies approach specifically designed for IV evaluation. The field study was conducted over a two-week period, evaluating both the desktop and mobile versions of the MyPSI prototype. Both versions received positive results, but the desktop version was slightly preferred over the mobile version, mainly due to familiarity and problems experienced with the mobile implementation. Design recommendations were derived to inform future designs of IV tools to support accessing PI across multiple devices. This research has shown that IV techniques can be enhanced to effectively support accessing PI across multiple devices. Future work will involve customising the MyPSI prototype for mobile phones and supporting additional platforms.
- Full Text:
- Date Issued: 2014
Extensibility in ORDBMS databases : an exploration of the data cartridge mechanism in Oracle9i
- Ndakunda, Tulimevava Kaunapawa
- Authors: Ndakunda, Tulimevava Kaunapawa
- Date: 2013-06-18
- Subjects: Database management , Oracle (Computer file)
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4686 , http://hdl.handle.net/10962/d1008098 , Database management , Oracle (Computer file)
- Description: To support current and emerging database applications, Object-Relational Database Management Systems (ORDBMS) provide mechanisms to extend the data storage capabilities and the functionality of the database with application-specific types and methods. Using these mechanisms, the database may contain user-defined data types, large objects (LOBs), external procedures, extensible indexing, query optimisation techniques and other features that are treated in the same way as built-in database features . The many extensibility options provided by the ORDBMS, however, raise several implementation challenges that are not always obvious. This thesis examines a few of the key challenges that arise when extending Oracle database with new functionality. To realise the potential of extensibility in Oracle, the thesis used the problem area of image retrieval as the main test domain. Current research efforts in image retrieval are lagging behind the required retrieval, but are continuously improving. As better retrieval techniques become available, it is important that they are integrated into the available database systems to facilitate improved retrieval. The thesis also reports on the practical experiences gained from integrating an extensible indexing scenario. Sample scenarios are integrated in Oracle9i database using the data cartridge mechanism, which allows Oracle database functionality to be extended with new functional components. The integration demonstrates how additional functionality may be effectively applied to both general and specialised domains in the database. It also reveals alternative design options that allow data cartridge developers, most of who are not database server experts, to extend the database. The thesis is concluded with some of the key observations and options that designers must consider when extending the database with new functionality. The main challenges for developers are the learning curve required to understand the data cartridge framework and the ability to adapt already developed code within the constraints of the data cartridge using the provided extensibility APls. Maximum reusability relies on making good choices for the basic functions, out of which specialised functions can be built. , KMBT_363 , Adobe Acrobat 9.54 Paper Capture Plug-in
- Full Text:
- Authors: Ndakunda, Tulimevava Kaunapawa
- Date: 2013-06-18
- Subjects: Database management , Oracle (Computer file)
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4686 , http://hdl.handle.net/10962/d1008098 , Database management , Oracle (Computer file)
- Description: To support current and emerging database applications, Object-Relational Database Management Systems (ORDBMS) provide mechanisms to extend the data storage capabilities and the functionality of the database with application-specific types and methods. Using these mechanisms, the database may contain user-defined data types, large objects (LOBs), external procedures, extensible indexing, query optimisation techniques and other features that are treated in the same way as built-in database features . The many extensibility options provided by the ORDBMS, however, raise several implementation challenges that are not always obvious. This thesis examines a few of the key challenges that arise when extending Oracle database with new functionality. To realise the potential of extensibility in Oracle, the thesis used the problem area of image retrieval as the main test domain. Current research efforts in image retrieval are lagging behind the required retrieval, but are continuously improving. As better retrieval techniques become available, it is important that they are integrated into the available database systems to facilitate improved retrieval. The thesis also reports on the practical experiences gained from integrating an extensible indexing scenario. Sample scenarios are integrated in Oracle9i database using the data cartridge mechanism, which allows Oracle database functionality to be extended with new functional components. The integration demonstrates how additional functionality may be effectively applied to both general and specialised domains in the database. It also reveals alternative design options that allow data cartridge developers, most of who are not database server experts, to extend the database. The thesis is concluded with some of the key observations and options that designers must consider when extending the database with new functionality. The main challenges for developers are the learning curve required to understand the data cartridge framework and the ability to adapt already developed code within the constraints of the data cartridge using the provided extensibility APls. Maximum reusability relies on making good choices for the basic functions, out of which specialised functions can be built. , KMBT_363 , Adobe Acrobat 9.54 Paper Capture Plug-in
- Full Text:
Database Management & Design: CSC 224
- Authors: Sibanda, K , Kogeda, P
- Date: 2010-02
- Subjects: Database management
- Language: English
- Type: Examination paper
- Identifier: vital:17752 , http://hdl.handle.net/10353/d1010238
- Description: Database Management & Design: CSC 224, supplementary examination February 2010.
- Full Text: false
- Date Issued: 2010-02
- Authors: Sibanda, K , Kogeda, P
- Date: 2010-02
- Subjects: Database management
- Language: English
- Type: Examination paper
- Identifier: vital:17752 , http://hdl.handle.net/10353/d1010238
- Description: Database Management & Design: CSC 224, supplementary examination February 2010.
- Full Text: false
- Date Issued: 2010-02
A comparison of open source object-oriented database products
- Authors: Khayundi, Peter
- Date: 2009
- Subjects: Object-oriented databases , Relational databases , Database management , Database selection , Database searching
- Language: English
- Type: Thesis , Masters , MSc (Computer Science)
- Identifier: vital:11384 , http://hdl.handle.net/10353/254 , Object-oriented databases , Relational databases , Database management , Database selection , Database searching
- Description: Object oriented databases have been gaining popularity over the years. Their ease of use and the advantages that they offer over relational databases have made them a popular choice amongst database administrators. Their use in previous years was restricted to business and administrative applications, but improvements in technology and the emergence of new, data-intensive applications has led to the increase in the use of object databases. This study investigates four Open Source object-oriented databases on their ability to carry out the standard database operations of storing, querying, updating and deleting database objects. Each of these databases will be timed in order to measure which is capable of performing a particular function faster than the other.
- Full Text:
- Date Issued: 2009
- Authors: Khayundi, Peter
- Date: 2009
- Subjects: Object-oriented databases , Relational databases , Database management , Database selection , Database searching
- Language: English
- Type: Thesis , Masters , MSc (Computer Science)
- Identifier: vital:11384 , http://hdl.handle.net/10353/254 , Object-oriented databases , Relational databases , Database management , Database selection , Database searching
- Description: Object oriented databases have been gaining popularity over the years. Their ease of use and the advantages that they offer over relational databases have made them a popular choice amongst database administrators. Their use in previous years was restricted to business and administrative applications, but improvements in technology and the emergence of new, data-intensive applications has led to the increase in the use of object databases. This study investigates four Open Source object-oriented databases on their ability to carry out the standard database operations of storing, querying, updating and deleting database objects. Each of these databases will be timed in order to measure which is capable of performing a particular function faster than the other.
- Full Text:
- Date Issued: 2009
- «
- ‹
- 1
- ›
- »