This is a collection of tools and libraries I developed to facilitate my research.
DView Framework
The DView framework is a Matlab based library for operating on tabular data that can exceed available RAM. It offers a workflow similar to Python’s Pandas DataFrames, but is suitable for big data applications without requiring additional infrastructure or any adjustments to the user code. For instance, DViews allow me to work on the complete daily CRSP file with a few hundred merged or generated additional variables on a regular desktop PC without having to worry about running out of memory.
The key idea of DViews is to only allow the user to operate on the dataset through a “view” into the underlying data. The distinction between a “view layer” and “data store layer” makes it possible to operate on huge datasets by replacing the default in-memory storage with a disk-based data store under the hood. This disk-based data store is implemented using the HDF5 format, which is a highly efficient data format built for fast I/O and storage of large amounts of heterogeneous data.
I developed the DView framework over many years, adding features on the go as they became necessary during my research. The framework has reached a mature state and heavily used by several researchers at the finance department. A public version will be made available soon.
Github: Coming soon …
Matlab-to-Stata Library
This library contains a set of functions to interact with Stata from within Matlab. Supported features of the library include:
- Opening and closing a Stata instance
- Sending a dataset from the Matlab workspace to Stata, and vice versa
- Running Stata regression commands
- Returning the regression results back to Matlab as a struct variable
Github: Coming soon …
Matlab-to-Excel Library
This library contains a set of classes and functions to interact with Microsoft Excel from within Matlab. Workbooks, worksheets, and cell ranges are represented as Matlab classes providing a high-level interface to working with Excel. These classes allow operations, such as opening, saving and closing files; creating worksheets; reading & writing data at specified ranges in a worksheet. This library is useful when more flexibility in interacting with Excel is required than what is provided by the built-in xlsread and xlswrite functions.
Github: Coming soon …
Datastream Batch Downloader
A Python wrapper around the REST-based Datastream API offered by the DSWS product. Data of requested series are directly returned as Pandas DataFrames. Large requests are decomposed internally into a series of smaller requests to avoid exceeding product limitations.
There is also a set of corresponding Matlab functions that make use of the Python wrapper.
Github: Coming soon …
Function Explorer
GUI to graphically explore the shape of a multi-dimensional numerical function. This works by plotting the function with respect to one or two parameters (x- and y-axis) while holding all remaining parameters constant at a specified level. The fixed levels and the parameters of interest for the x- and y-axis can be set interactively in the GUI.
A typical use case of the Function Explorer is to visualize likelihood functions or pricing models with more than two parameters.
Github: Coming soon …
CSV File Import Tool
Matlab class to assist with importing large and complex csv files. To determine the structure of a csv file, the class supports interactively inspecting a small sample of a file that would be too large to be opened in a text editor. Separators, column names, and column types can also be auto-detected for regularly structured files.
Working with csv files can be a pain when the import fails due to anomalies in the file. In these cases it can be difficult to pinpoint the exact source of the problem. The tool assists with this by rolling back to the row where the operation failed and displaying its surrounding contents.
Github: Coming soon …